日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

小米 pegasus_使用Google的Pegasus库生成摘要

發(fā)布時(shí)間:2023/12/15 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 小米 pegasus_使用Google的Pegasus库生成摘要 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

小米 pegasus

PEGASUS stands for Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models. It uses self-supervised objective Gap Sentences Generation (GSG) to train a transformer encoder-decoder model. The paper can be found on arXiv. In this article, we will only focus on generating state of the art abstractive summaries using Google’s Pegasus library.

PEGASUS表示對(duì)于P再培訓(xùn)為E xtracted g接入點(diǎn),句子對(duì)于A bstractive SU mmarization 小號(hào)層序?qū)π蛄心P汀?它使用自我監(jiān)督的目標(biāo)間隙句生成(GSG)來(lái)訓(xùn)練變壓器編碼器-解碼器模型。 可以在arXiv上找到該論文。 在本文中,我們將只專(zhuān)注于使用Google的Pegasus庫(kù)生成最新的抽象摘要。

As of now, there is no easy way to generate the summaries using Pegasus library. However, Hugging Face is already working on implementing this and they are expecting to release it around September 2020. In the meantime, we can try to follow the steps mentioned Pegasus Github repository and explore Pegasus. So let’s get started.

到目前為止,還沒(méi)有使用Pegasus庫(kù)生成摘要的簡(jiǎn)便方法。 但是, Hugging Face已經(jīng)在努力實(shí)現(xiàn)此功能,他們希望在2020年9月左右發(fā)布它。與此同時(shí),我們可以嘗試按照提到的Pegasus Github存儲(chǔ)庫(kù)中的步驟進(jìn)行操作,并探索Pegasus。 因此,讓我們開(kāi)始吧。

This step will clone the library on GitHub, create /content/pegasus folder, and install requirements.

此步驟將在GitHub上克隆庫(kù),創(chuàng)建/ content / pegasus文件夾,并安裝需求。

Next, follow the instructions to install gsutil. The below steps worked well for me in Colab.

接下來(lái),按照說(shuō)明安裝gsutil 。 以下步驟在Colab中對(duì)我來(lái)說(shuō)效果很好。

This will create a folder named ckpt under /content/pegasus/ and then download all the necessary files (fine-tuned models, vocab etc.) from Google Cloud to /content/pegasus/ckpt.

這將在/ content / pegasus /下創(chuàng)建一個(gè)名為ckpt的文件夾 然后將所有必要的文件(微調(diào)模型,vocab等)從Google Cloud下載到/ content / pegasus / ckpt

If all the above steps completed successfully, we see the below folder structure in Google Colab. Under each downstream dataset, we can see fine-tuned models that we can use for generating extractive/abstractive summaries.

如果上述所有步驟成功完成,我們將在Google Colab中看到以下文件夾結(jié)構(gòu)。 在每個(gè)下游數(shù)據(jù)集下,我們可以看到可用于生成提取/抽象摘要的微調(diào)模型。

Though it’s not mentioned in Pegasus Github repository README instruction, below pegasus installation step is necessary otherwise you will run into errors. Also, make sure you are in root folder /content before executing this step.

盡管Pegasus Github存儲(chǔ)庫(kù)README指令中未提及,但在飛馬安裝步驟下面是必需的,否則您將遇到錯(cuò)誤。 另外,在執(zhí)行此步驟之前,請(qǐng)確保您位于根目錄/ content中

Now, let us try to understand about pre-training corpus and downstream datasets of Pegasus. Pegasus is pre-trained on C4 & Hugenews corpora and it is then fine-tuned on 12 downstream datasets. The evaluation results on downstream datasets are mentioned in Github and also in the paper. Some of these datasets are extractive & some are abstractive. So the use of the dataset depends on if we are looking for extractive summaries or abstractive summaries.

現(xiàn)在,讓我們嘗試了解有關(guān)Pegasus的預(yù)訓(xùn)練語(yǔ)料庫(kù)和下游數(shù)據(jù)集。 飛馬座在C4Hugenews語(yǔ)料庫(kù)上進(jìn)行了預(yù)訓(xùn)練,然后在12個(gè)下游數(shù)據(jù)集中進(jìn)行了微調(diào)。 Github和論文中都提到了對(duì)下游數(shù)據(jù)集的評(píng)估結(jié)果。 這些數(shù)據(jù)集中有些是可提取的,有些則是抽象的。 因此,數(shù)據(jù)集的使用取決于我們是在尋找提取摘要還是抽象摘要。

Once all the above steps are taken care of, we can now jump to evaluate.py step mentioned below but it will take longer to complete as it will try to make predictions on all the data which are part of the evaluation set of the respective fine-tuned dataset being used. Since we are interested in summaries of custom text or sample text, we need to make minor changes public_params.py file found under /content/pegasus/pegasus/params/public_params.py as shown below.

完成上述所有步驟后,我們現(xiàn)在可以跳至以下提到的evaluate.py步驟。但是,由于它將嘗試對(duì)屬于相應(yīng)標(biāo)準(zhǔn)的評(píng)估集的所有數(shù)據(jù)進(jìn)行預(yù)測(cè),因此需要更長(zhǎng)的時(shí)間才能完成調(diào)整后的數(shù)據(jù)集。 由于我們對(duì)自定義文本或示例文本的摘要感興趣,因此我們需要對(duì)public_params.py下的/content/pegasus/pegasus/params/public_params.py文件進(jìn)行較小的更改。 如下圖所示。

Here I am making changes to reddit_tifu as I am trying to use reddit_tifu dataset for generating an abstractive summary. In case if you are experimenting with aeslc or other downstream datasets you are requested to make similar changes.

我在這里對(duì)reddit_tifu進(jìn)行更改 當(dāng)我嘗試使用reddit_tifu數(shù)據(jù)集生成抽象摘要時(shí)。 如果您正在嘗試使用aeslc或其他下游數(shù)據(jù)集,則需要進(jìn)行類(lèi)似的更改。

Here we are passing text from this news article is inp which is then copied to inputs. Note that empty string to passed to targets as this is what we are going to predict. Then both inputs are targets are used to create tfrecord, which pegusus expects.

在這里,我們正在傳遞新聞文章 inp文本,然后將其復(fù)制到inputs 。 請(qǐng)注意,傳遞給targets空字符串是我們要預(yù)測(cè)的。 那么這兩個(gè)inputs是targets被用于創(chuàng)建tfrecord,這pegusus預(yù)期。

inp = ‘“replace this with text from the above this article’’’

inp ='“用本文 上方的文字 替換 ”

As the final step, when evaluate.py is run, the model makes a prediction or generates a summary of the above news article’s text. This will generate 4 output files in the respective downstream dataset’s folder. In this case input, output, prediction and text_metric text files will be created under reddit_tifu folder.

作為最后一步,當(dāng) evaluate.py運(yùn)行,該模型進(jìn)行預(yù)測(cè)或生成上述新聞報(bào)道的文字摘要。 這將在相應(yīng)的下游數(shù)據(jù)集的文件夾中生成4個(gè)輸出文件。 在這種情況下, input , output , prediction 和 text_metric 文本文件將在reddit_tifu文件夾下創(chuàng)建。

Abstractive summary (prediction):“India and Afghanistan on Monday discussed the evolving security situation in the region against the backdrop of a spike in terrorist violence in the country.”

摘要摘要(預(yù)測(cè)): “印度和阿富汗周一討論了該國(guó)恐怖活動(dòng)激增的背景下該地區(qū)不斷發(fā)展的安全局勢(shì)。”

This looks like a very well generated abstractive summary when we compare with the news article we passed as input for generating the summary. By using different downstream datasets we can generate extractive or abstractive summaries. Also, we can play around with different parameter values and see how it changes summaries.

當(dāng)我們與作為生成摘要的輸入傳遞的新聞文章進(jìn)行比較時(shí),這看起來(lái)像是生成良好的摘要摘要。 通過(guò)使用不同的下游數(shù)據(jù)集,我們可以生成提取摘要或抽象摘要。 另外,我們可以嘗試使用不同的參數(shù)值,并查看其如何更改摘要。

翻譯自: https://towardsdatascience.com/generate-summaries-using-googles-pegasus-library-772633a161c2

小米 pegasus

總結(jié)

以上是生活随笔為你收集整理的小米 pegasus_使用Google的Pegasus库生成摘要的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。