當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

迁移学习 nlp_NLP的发展-第3部分-使用ULMFit进行迁移学习

發(fā)布時(shí)間：2023/12/15 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了迁移学习 nlp_NLP的发展-第3部分-使用ULMFit进行迁移学习小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

遷移學(xué)習(xí) nlp

This is the third part of a series of posts showing the improvements in NLP modeling approaches. We have seen the use of traditional techniques like Bag of Words, TF-IDF, then moved on to RNNs and LSTMs. This time we’ll look into one of the pivotal shifts in approaching NLP Tasks — Transfer Learning!

這是一系列文章的第三部分，顯示了NLP建模方法的改進(jìn)。我們已經(jīng)看到了諸如詞袋，TF-IDF之類的傳統(tǒng)技術(shù)的使用，然后又轉(zhuǎn)向了RNN和LSTM 。這次，我們將探討處理NLP任務(wù)的一項(xiàng)重要轉(zhuǎn)變-轉(zhuǎn)移學(xué)習(xí)！

The complete code for this tutorial is available at this Kaggle Kernel

本教程的完整代碼可在此Kaggle內(nèi)核中找到。

超低價(jià) (ULMFit)

The idea of using Transfer Learning is quite new in NLP Tasks, while it has been quite prominently used in Computer Vision tasks! This new way of looking at NLP was first proposed by Howard Jeremy, and has transformed the way we looked at data previously!

在NLP任務(wù)中，使用轉(zhuǎn)移學(xué)習(xí)的想法是相當(dāng)新的，而在計(jì)算機(jī)視覺任務(wù)中已經(jīng)非常顯著地使用了轉(zhuǎn)移學(xué)習(xí)！這種查看NLP的新方法最初是由霍華德·杰里米(Howard Jeremy)提出的，它改變了我們之前查看數(shù)據(jù)的方式！

The core idea is two-fold — using generative pre-trained Language Model + task-specific fine-tuning was first explored in ULMFiT (Howard & Ruder, 2018), directly motivated by the success of using ImageNet pre-training for computer vision tasks. The base model is AWD-LSTM.

核心思想有兩個(gè)方面-使用生成式預(yù)訓(xùn)練語(yǔ)言模型+特定于任務(wù)的微調(diào)是在ULMFiT中首次探索的(Howard＆Ruder，2018)，其直接動(dòng)機(jī)是將ImageNet預(yù)訓(xùn)練成功用于計(jì)算機(jī)視覺任務(wù)。基本模型是AWD-LSTM。

A Language Model is exactly like it sounds — the output of this model is to predict the next word of a sentence. The goal is to have a model that can understand the semantics, grammar, and unique structure of a language.

語(yǔ)言模型就像聽起來(lái)一樣—該模型的輸出是預(yù)測(cè)句子的下一個(gè)單詞。我們的目標(biāo)是建立一個(gè)能夠理解語(yǔ)言的語(yǔ)義，語(yǔ)法和獨(dú)特結(jié)構(gòu)的模型。

ULMFit follows three steps to achieve good transfer learning results on downstream language classification tasks:

ULMFit遵循三個(gè)步驟以在下游語(yǔ)言分類任務(wù)上獲得良好的遷移學(xué)習(xí)結(jié)果：

General Language Model pre-training: on Wikipedia text.

通用語(yǔ)言模型預(yù)培訓(xùn)：在Wikipedia文本上。

Target task Language Model fine-tuning: ULMFiT proposed two training techniques for stabilizing the fine-tuning process.

目標(biāo)任務(wù)語(yǔ)言模型的微調(diào)：ULMFiT提出了兩種訓(xùn)練技術(shù)來(lái)穩(wěn)定微調(diào)過(guò)程。

Target task classifier fine-tuning: The pretrained LM is augmented with two standard feed-forward layers and a softmax normalization at the end to predict a target label distribution.

目標(biāo)任務(wù)分類器的微調(diào)：預(yù)訓(xùn)練的LM通過(guò)兩個(gè)標(biāo)準(zhǔn)前饋層和最后的softmax歸一化進(jìn)行增強(qiáng)，以預(yù)測(cè)目標(biāo)標(biāo)簽的分布。

對(duì)NLP使用fast.ai- (Using fast.ai for NLP -)

fast.ai’s motto — Making Neural Networks Uncool again — tells you a lot about their approach ;) Implementation of these models is remarkably simple and intuitive, and with good documentation, you can easily find a solution if you get stuck anywhere. Along with this, and a few other reasons I elaborate below, I decided to try out the fast.ai library which is built on top of PyTorch instead of Keras. Despite being used to working in Keras, I didn’t find it difficult to navigate fast.ai and the learning curve is quite fast to implement advanced things as well!

fast.ai的座右銘-再次使神經(jīng)網(wǎng)絡(luò)變得不酷-向您介紹了他們的方法；)這些模型的實(shí)現(xiàn)非常簡(jiǎn)單直觀，并且有了良好的文檔，如果您遇到任何麻煩，都可以輕松找到解決方案。伴隨著此，以及下面我要闡述的其他一些原因，我決定嘗試在PyTorch而非Keras之上構(gòu)建的fast.ai庫(kù)。盡管習(xí)慣了在Keras上工作，但我發(fā)現(xiàn)快速導(dǎo)航并不困難。愛，而且學(xué)習(xí)曲線也很快就能實(shí)現(xiàn)高級(jí)功能！

In addition to its simplicity, there are some advantages of using fast.ai’s implementation -

除了簡(jiǎn)單之外，使用fast.ai的實(shí)現(xiàn)還有一些優(yōu)勢(shì)-

Discriminative fine-tuning is motivated by the fact that different layers of LM capture different types of information (see discussion above). ULMFiT proposed to tune each layer with different learning rates, {η1,…,η?,…,ηL}, where η is the base learning rate for the first layer, η? is for the ?-th layer and there are L layers in total.
區(qū)分微調(diào)的動(dòng)機(jī)是，LM的不同層捕獲不同類型的信息(請(qǐng)參見上面的討論)。 ULMFiT建議用不同的學(xué)習(xí)速率{η1，…，η?，…，ηL}來(lái)調(diào)整每一層，其中η是第一層的基本學(xué)習(xí)率，η?是第?層，總共有L層。

J(θ) is the gradient of Loss Function with respect to θ(?). η(?) is the learning rate of the ?-th layer.J (θ)是損失函數(shù)相對(duì)于θ(?)的梯度。 η(?)是第layer層的學(xué)習(xí)率。

Slanted triangular learning rates (STLR) refer to a special learning rate scheduling that first linearly increases the learning rate and then linearly decays it. The increase stage is short so that the model can converge to a parameter space suitable for the task fast, while the decay period is long allowing for better fine-tuning.
斜三角學(xué)習(xí)率(STLR)是指一種特殊的學(xué)習(xí)率計(jì)劃，該計(jì)劃首先線性增加學(xué)習(xí)率，然后線性降低它。增加階段很短，因此模型可以快速收斂到適合任務(wù)的參數(shù)空間，而衰減周期很長(zhǎng)，可以進(jìn)行更好的微調(diào)。

Learning rate increases till 200th iteration and then slowly decays. Howard, Ruder (2018) — Universal Language Model Fine-tuning for Text Classification學(xué)習(xí)率增加到第200次迭代，然后緩慢衰減。 Howard，Ruder(2018)—用于文本分類的通用語(yǔ)言模型微調(diào)

Let’s try to see how well this approach works for our dataset. I would also like to point out that all these ideas and code are available at fast.ai’s free official course for Deep Learning.

讓我們嘗試看看這種方法對(duì)我們的數(shù)據(jù)集的效果如何。我還想指出，所有這些想法和代碼都可以在fast.ai的免費(fèi)深度學(xué)習(xí)官方官方課程中獲得。

加載數(shù)據(jù)！ (Loading the data!)

Data in fast.ai is taken using TextLMDataBunch. This is very similar to ImageGenerator in Keras, where the path, labels, etc. are provided and the method prepares Train, Test and Validation data depending on the task at hand!

fast.ai中的數(shù)據(jù)是使用TextLMDataBunch獲取的。這與Keras中的ImageGenerator非常相似，其中提供了路徑，標(biāo)簽等，并且該方法根據(jù)手頭的任務(wù)準(zhǔn)備了Train，Test和Validation數(shù)據(jù)！

語(yǔ)言模型數(shù)據(jù)集 (Data Bunch for Language Model)

data_lm = TextLMDataBunch.from_csv(path,'train.csv', text_cols = 3, label_cols = 4)

分類任務(wù)的數(shù)據(jù)束 (Data Bunch for Classification Task)

data_clas = TextClasDataBunch.from_csv(path, 'train.csv', vocab=data_lm.train_ds.vocab, bs=32, text_cols = 3, label_cols = 4)

As discussed in the steps before, we start out first with a language model learner, while basically predicts the next word, given a sequence. Intuitively, this model tries to understand what language and context is. And then we use this model and fine-tune it for our specific task — Sentiment Classification.

正如前面步驟中討論的那樣，我們首先從語(yǔ)言模型學(xué)習(xí)者入手，基本上根據(jù)給定的順序預(yù)測(cè)下一個(gè)單詞。直觀地，該模型試圖理解什么是語(yǔ)言和上下文。然后，我們使用此模型并針對(duì)特定任務(wù)(情感分類)對(duì)其進(jìn)行微調(diào)。

步驟1.訓(xùn)練語(yǔ)言模型 (Step 1. Training a Language Model)

learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5)
learn.fit_one_cycle(1, 1e-2)

By default, we start with a pre-trained model, based on AWD-LSTM architecture. This model is built on top of simple LSTM units but has multiple dropout layers and hyperparameters. Based on the drop_mult argument, we can simultaneously set multiple dropouts within the model. I’ve kept it at 0.5. You can set it higher if you find that this model is overfitting.

默認(rèn)情況下，我們從基于AWD-LSTM體系結(jié)構(gòu)的預(yù)訓(xùn)練模型開始。該模型基于簡(jiǎn)單的LSTM單元構(gòu)建，但是具有多個(gè)輟學(xué)層和超參數(shù)。基于drop_mult參數(shù)，我們可以同時(shí)在模型中設(shè)置多個(gè)dropout 。我將其保持在0.5。如果發(fā)現(xiàn)此模型過(guò)度擬合，可以將其設(shè)置得更高。

區(qū)分性微調(diào) (Discriminative Fine-Tuning)

learn.unfreeze()
learn.fit_one_cycle(3, slice(1e-4,1e-2))

learn.unfreeze() makes all the layers of AWD-LSTM trainable. We can set a training rate using slice() function, which trains the last layer at 1e-02, while groups (of layers) in between would have geometrically reducing learning rates. In our case, I’ve specified the learning rate using the slice() method. It basically takes 1e-4 as the learning rate for the inner layer and 1e-2 for the outer layer. Layers in between have geometrically scaled learning rates.

learn.unfreeze()使AWD-LSTM的所有層均可訓(xùn)練。我們可以使用slice()函數(shù)設(shè)置訓(xùn)練速率，該函數(shù)在1e-02訓(xùn)練最后一層，而介于兩者之間的(層)組將在幾何上降低學(xué)習(xí)速率。在我們的例子中，我使用slice()方法指定了學(xué)習(xí)率。內(nèi)層的學(xué)習(xí)率基本上是1e-4，外層的學(xué)習(xí)率是1e-2。兩者之間的層具有按幾何比例縮放的學(xué)習(xí)率。

預(yù)期的三角學(xué)習(xí)率 (Slated Triangular Learning Rates)

This can be achieved simply by using fit_one_cycle() method in fast.ai

這可以通過(guò)在fast.ai中使用fit_one_cycle()方法簡(jiǎn)單地實(shí)現(xiàn)

逐漸解凍 (Gradual Unfreezing)

Though I’ve not experimented with this here, the idea is pretty simple. In the start, we keep the initial layers of the model as un-trainable, and then we slowly unfreeze earlier layers, as we keep on training. I’ll cover this in detail in next post

盡管我沒(méi)有在這里進(jìn)行嘗試，但是這個(gè)想法很簡(jiǎn)單。首先，我們將模型的初始層保持為不可訓(xùn)練，然后在繼續(xù)訓(xùn)練的同時(shí)慢慢解凍較早的層。我將在下一篇文章中詳細(xì)介紹

Since, we’ve made a language model, we can actually use it to predict the next few words based on certain input. This can tell if the model has begun to understand our reviews.

由于我們已經(jīng)建立了語(yǔ)言模型，因此實(shí)際上可以根據(jù)特定輸入使用它來(lái)預(yù)測(cè)接下來(lái)的幾個(gè)單詞。這可以判斷模型是否已開始理解我們的評(píng)論。

You can see that, with just a simple starting input, the model is able to generate realistic reviews. So, this assures that we are in the right direction.

您可以看到，僅需簡(jiǎn)單的開始輸入，該模型就可以生成現(xiàn)實(shí)的評(píng)論。因此，這可以確保我們朝著正確的方向前進(jìn)。

learn.save(file = Path('language_model'))
learn.save_encoder(Path('language_model_encoder'))

Let’s save this model and we will load it later for classification

保存此模型，稍后我們將其加載以進(jìn)行分類

步驟2.使用語(yǔ)言模型作為編碼器的分類任務(wù) (Step 2. Classification Task using Language Model as encoder)

learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5).to_fp16()
learn.model_dir = Path('/kaggle/working/')
learn.load_encoder('language_model_encoder')

Let’s get started with training. I’m running it in a similar manner. Training only outer layer for 1 epoch, unfreezing the whole network and training for 3 epochs.

讓我們開始培訓(xùn)吧。我以類似的方式運(yùn)行它。僅訓(xùn)練1個(gè)時(shí)代的外層，解凍整個(gè)網(wǎng)絡(luò)并訓(xùn)練3個(gè)時(shí)代。

learn.fit_one_cycle(1, 1e-2)
learn.unfreeze()
learn.fit_one_cycle(3, slice(1e-4, 1e-2))

準(zhǔn)確性-90％ (Accuracy — 90%)

With this alone (in just 4 epochs), we are at 90% accuracy! It’s an absolutely amazing result if you consider the amount of effort we’ve put in! Within just a few lines of code and nearly 10 mins of training, we’ve breached the 90% wall.

僅此一項(xiàng)(僅4個(gè)紀(jì)元)，我們的準(zhǔn)確性就達(dá)到了90％！如果您考慮我們付出的努力，這絕對(duì)是一個(gè)驚人的結(jié)果！在短短的幾行代碼和近10分鐘的培訓(xùn)中，我們突破了90％的要求。

I hope this was helpful for you as well to get started with NLP and Transfer Learning. I’ll catch you later in the 4th blog of this series, where we take this up a notch and explore transformers!

我希望這對(duì)您也對(duì)NLP和轉(zhuǎn)學(xué)學(xué)習(xí)有所幫助。我將在本系列的第4個(gè)博客中稍后吸引您，在此我們將其提升一個(gè)檔次并探索變形金剛！

翻譯自: https://medium.com/analytics-vidhya/evolution-of-nlp-part-3-transfer-learning-using-ulmfit-267d0a73421e

遷移學(xué)習(xí) nlp

總結(jié)

以上是生活随笔為你收集整理的迁移学习 nlp_NLP的发展-第3部分-使用ULMFit进行迁移学习的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： airflow使用_使用AirFlow，
下一篇：情感分析朴素贝叶斯_朴素贝叶斯推文的情感