bart使用方法_使用简单变压器的BART释义
bart使用方法
介紹 (Introduction)
BART is a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.
BART是一種用于預(yù)訓(xùn)練序列到序列模型的去噪自動編碼器。 通過(1)使用任意噪聲功能破壞文本,以及(2)學(xué)習(xí)模型以重建原始文本來訓(xùn)練BART。
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension -
-BART:對自然語言生成,翻譯和理解進(jìn)行序列到序列的預(yù)訓(xùn)練降噪 -
Don’t worry if that sounds a little complicated; we are going to break it down and see what it all means. To add a little bit of background before we dive into BART, it’s time for the now-customary ode to Transfer Learning with self-supervised models. It’s been said many times over the past couple of years, but Transformers really have achieved incredible success in a wide variety of Natural Language Processing (NLP) tasks.
如果這聽起來有點(diǎn)復(fù)雜,請不要擔(dān)心。 我們將對其進(jìn)行分解,并查看其含義。 為了在開始使用BART之前增加一些背景知識,現(xiàn)在是時(shí)候讓傳統(tǒng)習(xí)俗向自我監(jiān)督模型轉(zhuǎn)移學(xué)習(xí)了。 在過去的幾年中已經(jīng)有很多次這樣的說法,但是Transformers確實(shí)在各種自然語言處理(NLP)任務(wù)中取得了令人難以置信的成功。
BART uses a standard Transformer architecture (Encoder-Decoder) like the original Transformer model used for neural machine translation but also incorporates some changes from BERT (only uses the encoder) and GPT (only uses the decoder). You can refer to the 2.1 Architecture section of the BART paper for more details.
BART使用標(biāo)準(zhǔn)的Transformer體系結(jié)構(gòu)(編碼器-解碼器),就像用于神經(jīng)機(jī)器翻譯的原始Transformer模型一樣,但是還結(jié)合了BERT(僅使用編碼器)和GPT(僅使用解碼器)的一些更改。 您可以參考BART論文的2.1體系結(jié)構(gòu)部分以獲取更多詳細(xì)信息。
培訓(xùn)前的BART (Pre-Training BART)
BART is pre-trained by minimizing the cross-entropy loss between the decoder output and the original sequence.
通過最小化解碼器輸出和原始序列之間的交叉熵?fù)p失來對BART進(jìn)行預(yù)訓(xùn)練。
屏蔽語言建模(MLM) (Masked Language Modeling (MLM))
MLM models such as BERT are pre-trained to predict masked tokens. This process can be broken down as follows:
像BERT這樣的MLM模型已經(jīng)過預(yù)訓(xùn)練,可以預(yù)測被屏蔽的令牌。 此過程可以細(xì)分如下:
Replace a random subset of the input with a mask token [MASK]. (Adding noise/corruption)
用掩碼標(biāo)記[MASK]替換輸入的隨機(jī)子集。 (增加噪音/腐敗)
The model predicts the original tokens for each of the [MASK] tokens. (Denoising)
該模型預(yù)測每個(gè)[MASK]的原始令牌 令牌。 (去噪)
Importantly, BERT models can “see” the full input sequence (with some tokens replaced with [MASK]) when attempting to predict the original tokens. This makes BERT a bidirectional model, i.e. it can “see” the tokens before and after the masked tokens.
重要的是,當(dāng)嘗試預(yù)測原始令牌時(shí),BERT模型可以“看到”完整的輸入序列(某些令牌被[MASK]替換)。 這使BERT成為雙向模型,即它可以“看到”被屏蔽令牌之前和之后的令牌。
paper紙中的圖1(a)This is suited for tasks like classification where you can use information from the full sequence to perform the prediction. However, it is less suited for text generation tasks where the prediction depends only on the previous words.
這適用于諸如分類之類的任務(wù),您可以在其中使用完整序列中的信息來執(zhí)行預(yù)測。 但是,它不太適合預(yù)測僅依賴于先前單詞的文本生成任務(wù)。
自回歸模型 (Autoregressive Models)
Models used for text generation, such as GPT2, are pre-trained to predict the next token given the previous sequence of tokens. This pre-training objective results in models that are well-suited for text generation, but not for tasks like classification.
在給定先前標(biāo)記序列的情況下,對用于文本生成的模型(例如GPT2)進(jìn)行了預(yù)訓(xùn)練,以預(yù)測下一個(gè)標(biāo)記。 這種預(yù)訓(xùn)練目標(biāo)產(chǎn)生的模型非常適合于文本生成,但不適用于分類等任務(wù)。
paper紙中的圖1(b)BART序列對序列 (BART Sequence-to-Sequence)
BART has both an encoder (like BERT) and a decoder (like GPT), essentially getting the best of both worlds.
BART同時(shí)具有編碼器(如BERT)和解碼器(如GPT),從本質(zhì)上兼顧了兩者。
The encoder uses a denoising objective similar to BERT while the decoder attempts to reproduce the original sequence (autoencoder), token by token, using the previous (uncorrupted) tokens and the output from the encoder.
編碼器使用類似于BERT的降噪目標(biāo),同時(shí)解碼器嘗試使用先前的(未破壞的)令牌和編碼器的輸出逐個(gè)令牌地再現(xiàn)原始序列(自動編碼器)。
paper紙中的圖1(c)A significant advantage of this setup is the unlimited flexibility of choosing the corruption scheme; including changing the length of the original input. Or, in fancier terms, the text can be corrupted with an arbitrary noising function.
這種設(shè)置的顯著優(yōu)點(diǎn)是選擇損壞方案時(shí)具有無限的靈活性。 包括更改原始輸入的長度。 或者,更隨意地說,文本可以使用任意的雜訊功能破壞。
The corruption schemes used in the paper are summarized below.
本文中使用的損壞方案摘要如下。
The authors note that training BART with text infilling yields the most consistently strong performance across many tasks.
作者指出,對BART進(jìn)行文本填充訓(xùn)練可以在許多任務(wù)中獲得最一致的強(qiáng)大性能。
For the task we are interested in, namely paraphrasing, the pre-trained BART model can be fine-tuned directly using the input sequence (original phrase) and the target sequence (paraphrased sentence) as a Sequence-to-Sequence model.
對于我們感興趣的任務(wù),即釋義 ,可以使用輸入序列(原始短語)和目標(biāo)序列(釋義的句子)作為序列轉(zhuǎn)序列模型直接對預(yù)訓(xùn)練的BART模型進(jìn)行微調(diào)。
This also works for tasks like summarization and abstractive question answering.
這也適用于諸如摘要和抽象問題解答之類的任務(wù)。
建立 (Setup)
We will use the Simple Transformers library, based on the Hugging Face Transformers library, to train the models.
我們將使用基于Hugging Face Transformers庫的Simple Transformers庫來訓(xùn)練模型。
1. Install Anaconda or Miniconda Package Manager from here.
1.從此處安裝Anaconda或Miniconda Package Manager。
2. Create a new virtual environment and install packages.
2.創(chuàng)建一個(gè)新的虛擬環(huán)境并安裝軟件包。
conda create -n st python pandas tqdmconda activate st3. If using CUDA:
3.如果使用CUDA:
conda install pytorch>=1.6 cudatoolkit=10.2 -c pytorchelse:
其他:
conda install pytorch cpuonly -c pytorch4. Install simpletransformers.
4.安裝simpletransformers。
pip install simpletransformers資料準(zhǔn)備 (Data Preparation)
We will be combining three datasets to serve as training data for our BART Paraphrasing Model.
我們將結(jié)合三個(gè)數(shù)據(jù)集作為BART釋義模型的訓(xùn)練數(shù)據(jù)。
Google PAWS-Wiki Labeled (Final)
Google PAWS-Wiki標(biāo)簽(最終)
Quora Question Pairs Dataset
Quora問題對數(shù)據(jù)集
Microsoft Research Paraphrase Corpus (MSRP)
Microsoft研究復(fù)述語料庫 (MSRP)
The bash script below can be used to easily download and prep the first two datasets, but the MSRP dataset has to be downloaded manually from the link. (Microsoft hasn’t provided a direct link 😞 )
下面的bash腳本可用于輕松下載和準(zhǔn)備前兩個(gè)數(shù)據(jù)集,但是必須從鏈接手動下載MSRP數(shù)據(jù)集。 (Microsoft沒有提供直接鏈接😞)
Make sure you place the files in the same directory ( data ) to avoid annoyances with file paths in the example code.
確保將文件放在相同的目錄( data )中,以免示例代碼中的文件路徑造成麻煩。
We also have a couple of helper functions, one to load data, and one to clean unnecessary spaces in the training data. Both of these functions are defined in utils.py.
我們還有兩個(gè)幫助函數(shù),一個(gè)用于加載數(shù)據(jù),另一個(gè)用于清除訓(xùn)練數(shù)據(jù)中不必要的空間。 這兩個(gè)函數(shù)都在utils.py中定義。
Some of the data have spaces before punctuation marks that we need to remove. clean_unnecessary_spaces() function is used for this purpose.
有些數(shù)據(jù)在標(biāo)點(diǎn)符號之前需要刪除空格。 clean_unnecessary_spaces() 函數(shù)用于此目的。
用BART解釋 (Paraphrasing with BART)
Once the data is prepared, training the model is quite simple.
一旦準(zhǔn)備好數(shù)據(jù),訓(xùn)練模型就非常簡單。
Note that you can find all the code in the Simple Transformers examples here.
請注意,您可以在 此處 的“簡單變形金剛”示例中找到所有代碼 。
First, we import all the necessary stuff and set up logging.
首先,我們導(dǎo)入所有必要的內(nèi)容并設(shè)置日志記錄。
Next, we load the datasets.
接下來,我們加載數(shù)據(jù)集。
Then, we set up the model and hyperparameter values. Note that we are using the pre-trained facebook/bart-large model, and fine-tuning it on our own dataset.
然后,我們設(shè)置模型和超參數(shù)值。 請注意,我們使用的是預(yù)先訓(xùn)練的facebook/bart-large模型,并在我們自己的數(shù)據(jù)集中對其進(jìn)行了微調(diào)。
Finally, we’ll generate paraphrases for each of the sentences in the test data.
最后,我們將為測試數(shù)據(jù)中的每個(gè)句子生成釋義。
This will write the predictions to the predictions directory.
這會將預(yù)測寫入 predictions 目錄。
超參數(shù) (Hyperparameters)
The hyperparameter values are set to general, sensible values without doing hyperparameter optimization. For this task, the ground truth does not represent the only possible correct answer (nor is it necessarily the best answer). Because of this, tuning the hyperparameters to nudge the generated text as close to the ground truth as possible doesn’t make much sense.
將超參數(shù)值設(shè)置為通用的合理值,而無需進(jìn)行超參數(shù)優(yōu)化。 對于此任務(wù), 基本事實(shí)并不代表唯一可能的正確答案(也不一定是最佳答案)。 因此,調(diào)整超參數(shù)以使生成的文本盡可能接近地面真實(shí)情況沒有多大意義。
Our aim is to generate good paraphrased sequences rather than to produce the exact paraphrased sequence from the dataset.
我們的目標(biāo)是生成良好的釋義序列,而不是從數(shù)據(jù)集中生成確切的釋義序列。
If you are interested in Hyperparameter Optimization with Simple Transformers (particularly useful with other models/tasks like classification), do check out my guide here.
如果您對使用簡單變壓器進(jìn)行超參數(shù)優(yōu)化感興趣(特別適用于其他模型/任務(wù),例如分類),請?jiān)诖颂幉榭次业闹改稀?
The decoding algorithm (and the relevant hyperparameters) used has a considerable impact on the quality and nature of the generated text. The values I’ve chosen (shown below) are generally suited to produce “natural” text.
所使用的解碼算法(和相關(guān)的超參數(shù))對所生成文本的質(zhì)量和性質(zhì)具有相當(dāng)大的影響。 我選擇的值(如下所示)通常適合于生成“自然”文本。
For more information, please refer to the excellent Hugging Face guide here.
有關(guān)更多信息,請參閱 此處 的出色的“擁抱面”指南 。
根據(jù)自己的句子嘗試模型 (Try out the model on your own sentences)
You can use the script below to test the model on any sentence.
您可以使用以下腳本在任何句子上測試模型。
結(jié)果 (Results)
Let’s look at some of the paraphrased sequences generated by the model for the test data. For each input sequence, the model will generate three ( num_return_sequences ) paraphrased sequences.
讓我們看一下模型為測試數(shù)據(jù)生成的一些釋義序列。 對于每個(gè)輸入序列,模型將生成三個(gè)( num_return_sequences )釋義序列。
1.
1。
Orignal:A recording of folk songs done for the Columbia society in 1942 was largely arranged by Pjet?r Dungu.Truth:A recording of folk songs made for the Columbia society in 1942 was largely arranged by Pjet?r Dungu.Prediction:A recording of folk songs made in 1942 for the Columbia Society was largely arranged by Pjetr Dungu.A recording of folk songs for the Columbia society in 1942 was largely arranged by Pjetr Dungu.A recording of folk songs done for the Columbia Society in 1942 was largely arranged by Pjetr Dungu.2.
2。
Original:In mathematical astronomy, his fame is due to the introduction of the astronomical globe, and his early contributions to understanding the movement of the planets.Truth:His fame is due in mathematical astronomy to the introduction of the astronomical globe and to his early contributions to the understanding of the movement of the planets.Prediction:His fame in mathematical astronomy is due to the introduction of the astronomical globe and his early contributions to understanding the movement of the planets.In mathematical astronomy, his fame is due to the introduction of the astronomical globe and his early contributions to understanding the motion of the planets.In mathematical astronomy his fame is due to the introduction of the astronomical globe and his early contributions to understanding the movement of the planets.3.
3。
Original:Why are people obsessed with Cara Delevingne?Truth:Why are people so obsessed with Cara Delevingne?Prediction:Why do people fall in love with Cara Delevingne?Why is everyone obsessed with Cara Delevingne?Why do people like Cara Delevingne?4.
4。
Original:Earl St Vincent was a British ship that was captured in 1803 and became a French trade man.Truth:Earl St Vincent was a British ship that was captured and became a French merchantman in 1803.Prediction:Earl St Vincent was a British ship captured in 1803 and became a French trader.Earl St Vincent was a British ship captured in 1803 and became a French trade man.Earl St Vincent was a British ship that was captured in 1803 and became a French trade man.5.
5,
Original:Worcester is a town and county city of Worcestershire in England.Truth:Worcester is a city and county town of Worcestershire in England.Prediction:Worcester is a town and county of Worcestershire in England.Worcester is a town and county town in Worcestershire in England.Worcester is a town and county town of Worcestershire in England.6. Out of domain sentence
6.域外句子
Original:The goal of any Deep Learning model is to take in an input and generate the correct output.Predictions >>>The goal of any deep learning model is to take an input and generate the correct output.The goal of a deep learning model is to take an input and generate the correct output.Any Deep Learning model the goal of which is to take in an input and generate the correct output.
As can be seen from these examples, our BART model has learned to generate paraphrases quite well!
從這些示例中可以看出,我們的BART模型已經(jīng)學(xué)會了很好地生成釋義!
討論區(qū) (Discussion)
潛在問題 (Potential Problems)
The generated paraphrases can sometimes have minor issues, some of which are listed below.
生成的復(fù)述有時(shí)可能會有一些小問題,下面列出了其中的一些問題。
Encouragingly, these issues seem to be quite rare and can most likely be averted by using better training data (the same problems can sometimes be seen in the training data ground truth as well).
令人鼓舞的是,這些問題似乎很少見,并且可以通過使用更好的訓(xùn)練數(shù)據(jù)來避免(有時(shí)在訓(xùn)練數(shù)據(jù)基礎(chǔ)事實(shí)中也可以看到相同的問題)。
結(jié)語 (Wrap Up)
Sequence-to-Sequence models like BART are another arrow in the quiver of NLP practitioners. They are particularly useful for tasks involving text generation such as paraphrasing, summarization, and abstractive question answering.
像BART這樣的序列到序列模型是NLP從業(yè)者的另一個(gè)箭牌。 它們對于涉及文本生成的任務(wù)特別有用,例如釋義,摘要和抽象問題解答。
Paraphrasing can be used for data augmentation where you can create a larger dataset by paraphrasing the available data.
釋義可以用于數(shù)據(jù)擴(kuò)充,您可以在其中通過釋義可用數(shù)據(jù)來創(chuàng)建更大的數(shù)據(jù)集。
翻譯自: https://towardsdatascience.com/bart-for-paraphrasing-with-simple-transformers-7c9ea3dfdd8c
bart使用方法
總結(jié)
以上是生活随笔為你收集整理的bart使用方法_使用简单变压器的BART释义的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 未来人类预告新款 X711 笔记本:配备
- 下一篇: 最后一季,《怪奇物语》第 5 季将在今年