日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

了解LSTM和GRU

發(fā)布時(shí)間:2023/12/15 编程问答 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 了解LSTM和GRU 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

lstm和gru

深度學(xué)習(xí) , 自然語言處理 (Deep Learning, Natural Language Processing)

In my last article, I have introduced Recurrent Neural Networks and the complications it carries. To combat the drawbacks we use LSTMs & GRUs.

在上一篇文章中,我介紹了遞歸神經(jīng)網(wǎng)絡(luò)及其帶來的復(fù)雜性。 為了克服這些缺點(diǎn),我們使用LSTM和GRU。

障礙,短期記憶 (The obstacle, Short-term Memory)

Recurrent Neural Networks are confined to short-term memory. If a long sequence is fed to the network, they’ll have a hard time remembering the information and might as well leave out important information from the beginning.

遞歸神經(jīng)網(wǎng)絡(luò)僅限于短期記憶。 如果將較長的時(shí)間饋入網(wǎng)絡(luò),他們將很難記住信息,并且一開始可能會(huì)遺漏重要的信息。

Besides, Recurrent Neural Networks faces Vanishing Gradient Problem when backpropagation comes into play. Due to the conflict, the updated gradients are much smaller leaving no change in our model and thus not contributing much in learning.

此外,當(dāng)反向傳播起作用時(shí),遞歸神經(jīng)網(wǎng)絡(luò)將面臨消失梯度問題。 由于存在沖突,更新后的梯度要小得多,因此我們的模型沒有任何變化,因此對(duì)學(xué)習(xí)沒有太大貢獻(xiàn)。

Weight Update Rule體重更新規(guī)則

“When we perform backpropagation, we calculate weights and biases for each node. But, if the improvements in the former layers are meager then the adjustment to the current layer will be much smaller. This causes gradients to dramatically diminish and thus leading to almost NULL changes in our model and due to that our model is no longer learning and no longer improving.”

“當(dāng)我們進(jìn)行反向傳播時(shí),我們將計(jì)算每個(gè)節(jié)點(diǎn)的權(quán)重和偏差。 但是,如果前幾層的改進(jìn)很少,那么對(duì)當(dāng)前層的調(diào)整將小得多。 這將導(dǎo)致梯度急劇減小,從而導(dǎo)致我們的模型幾乎為NULL更改,并且由于我們的模型不再學(xué)習(xí)并且不再改進(jìn)。”

為什么選擇LSTM和GRU? (Why LSTMs and GRUs?)

Let us say, you’re looking at reviews for Schitt’s Creek online to determine if you could watch it or not. The basic approach will be to read the review and determine its sentiments.

讓我們說,您正在在線查看Schitt's Creek的評(píng)論,以確定是否可以觀看。 基本方法是閱讀評(píng)論并確定其觀點(diǎn)。

When you look for the review, your subconscious mind will try to remember decisive keywords. You will try to remember more weighted words like “Aaaaastonishing”, “timeless”, “incredible”, “eccentric”, and “capricious” and will not focus on regular words like “think”, “exactly”, “most” etc.

當(dāng)您尋找評(píng)論時(shí),您的潛意識(shí)將設(shè)法記住決定性的關(guān)鍵字。 您將嘗試記住更重的單詞,例如“ Aaaaastonishing”,“永恒”,“令人難以置信”,“古怪”和“任性”,而不會(huì)專注于常規(guī)單詞,例如“思考”,“完全”,“多數(shù)”等。

The next time you’ll be asked to recall the review, you probably will be having a hard time, but, I bet you must remember the sentiment and few important and decisive words as mentioned above.

下次要求您回顧該評(píng)論時(shí),您可能會(huì)遇到困難,但是,我敢打賭,您必須記住上述觀點(diǎn)以及很少提及的重要和決定性的詞語。

And that’s what exactly LSTM and GRU are intended to operate.

這正是LSTM和GRU的確切目標(biāo)。

Learn and Remember only important information and forget every other stuff.

僅學(xué)習(xí)和記住重要信息,而忘記其他所有內(nèi)容。

LSTM(長期短期記憶) (LSTM (Long short term memory))

LSTMs are a progressive form of vanilla RNN that were introduced to combat its shortcomings. To implement the above-mentioned intuition and administer the significant information due to finite-sized state vector in RNN we employ selectively read, write, and forget gates.

LSTM是香草RNN的進(jìn)步形式,旨在克服其缺點(diǎn)。 為了實(shí)現(xiàn)上述直覺并由于RNN中有限大小的狀態(tài)向量而管理重要信息,我們采用選擇性的讀,寫和遺忘門。

Source來源

The abstract concept revolves around cell states and various gates. The cell state can transfer relative information to the sequence chain and is capable of carrying relevant information throughout the computation thus solving the problem of Short-term memory. As the process continues, the more relevant information is added and removed via gates. Gates are special types of neural networks that learn about relevant information during training.

抽象概念圍繞著細(xì)胞狀態(tài)和各種門。 單元狀態(tài)可以將相對(duì)信息傳遞到序列鏈,并且能夠在整個(gè)計(jì)算過程中攜帶相關(guān)信息,從而解決了短期記憶的問題。 隨著過程的繼續(xù),將通過門添加和刪除更相關(guān)的信息。 閘門是特殊類型的神經(jīng)網(wǎng)絡(luò),可在訓(xùn)練過程中了解相關(guān)信息。

選擇性寫 (Selective Write)

Let us assume, the hidden state (s?), previous hidden state (s???), current input (x?), and bias (b).

讓我們假設(shè)隱藏狀態(tài)( s? ),先前的隱藏狀態(tài)( s??? ),當(dāng)前輸入( x? )和偏置( b )。

Now, we are accumulating all the outputs from previous state s??? and computing output for current state s?

現(xiàn)在,我們將累加先前狀態(tài)s???的所有輸出,并計(jì)算當(dāng)前狀態(tài)s output的輸出

Vanilla RNN香草RNN

Using Selective Write, we are interested in only passing on the relevant information to the next state s?. To implement the strategy we could assign a value ranging from 0 to 1 to each input to determine how much information is to be passed on to the next hidden state.

使用選擇性寫入,我們只希望將相關(guān)信息傳遞到下一個(gè)狀態(tài)s?。 為了實(shí)現(xiàn)該策略,我們可以為每個(gè)輸入分配一個(gè)介于0到1之間的值,以確定將多少信息傳遞給下一個(gè)隱藏狀態(tài)。

Selective Write in LSTMLSTM中的選擇性寫入

We can store the fraction of information to be passed on in a vector h??? that can be computed by multiplying previous state vector s??? and o??? that stores the value between 0 and 1 for each input.

我們可以存儲(chǔ)的信息的部分,以在一個(gè)矢量h???噸帽子被傳遞可以通過以前的狀態(tài)向量s???相乘并o???,存儲(chǔ)0和1之間的值對(duì)每個(gè)輸入進(jìn)行計(jì)算。

The next issue we encounter is, how to get o????

我們遇到的下一個(gè)問題是,如何獲得o????

To compute o??? we must learn it and the only vectors that we have control on, are our parameters. So, to continue computation, we need to express o??? in the form of parameters.

要計(jì)算o???,我們必須學(xué)習(xí)它,而我們唯一可以控制的向量就是我們的參數(shù)。 因此,要繼續(xù)計(jì)算,我們需要以參數(shù)的形式表示o???

After learning Uo, Wo, and Bo using Gradient Descent, we can expect a precise prediction using our output gate (o???) that is controlling how much information will be passed to the next gate.

在使用梯度下降學(xué)習(xí)了Uo,WoBo之后 ,我們可以期望使用輸出門( o??? )進(jìn)行精確的預(yù)測,該輸出門控制將多少信息傳遞到下一個(gè)門。

選擇性閱讀 (Selective Read)

After passing the relevant information from the previous gate, we introduce a new hidden state vector ?? (marked in green).

從前一門傳遞了相關(guān)信息之后,我們引入了一個(gè)新的隱藏狀態(tài)向量?? (標(biāo)記為綠色)。

?? captures all the information from the previous state h??? and the current input x?.

??捕獲先前狀態(tài)h???和當(dāng)前輸入x?的所有信息。

But, our goal is to remove as much unimportant stuff as possible and to continue with our idea we will selectively read from the ?? to construct a new cell stage.

但是,我們的目標(biāo)是盡可能多地刪除不重要的內(nèi)容,并繼續(xù)我們的想法,我們將選擇性地從??中讀取內(nèi)容以構(gòu)建一個(gè)新的細(xì)胞階段。

Selective Read選擇性閱讀

To store all-important pieces of content, we will again roll back to 0–1 strategy where we will assign a value between 0–1 to each input defining the proportion that we will like to read.

要存儲(chǔ)所有重要的內(nèi)容,我們將再次回滾到0–1策略,在該策略中,我們將為每個(gè)輸入分配介于0–1之間的值,以定義我們希望讀取的比例。

Vector i? will store the proportional value for each input that will later be multiplied with ?? to control the information flowing through current input, which is called the Input Gate.

向量i?將存儲(chǔ)每個(gè)輸入的比例值,以后將與??相乘以控制流經(jīng)當(dāng)前輸入的信息,這稱為輸入門。

To compute i? we must learn it and the only vectors that we have control on are our parameters. So, to continue computation, we need to express i? in the form of parameters.

要計(jì)算i?,我們必須學(xué)習(xí)它,而我們唯一可以控制的向量就是我們的參數(shù)。 因此,要繼續(xù)計(jì)算,我們需要的參數(shù)的形式來表達(dá)i?。

After learning Ui, Wi, and Bi using Gradient Descent, we can expect a precise prediction using our input gate (i?) that is controlling how much information will be catered to our model.

在使用梯度下降學(xué)習(xí)Ui,WiBi之后 ,我們可以期望使用輸入門( i? )進(jìn)行精確的預(yù)測,該輸入門將控制將要滿足我們模型的信息量。

Summing up the parameters that were learned till now:

總結(jié)到目前為止所學(xué)的參數(shù):

有選擇地忘記 (Selectively Forget)

After selectively reading and writing the information, now we are aiming to forget all the irrelevant stuff that could help us to cut the clutter.

在選擇性地讀取和寫入信息之后,現(xiàn)在我們的目標(biāo)是忘記所有可以幫助我們減少混亂的無關(guān)緊要的東西。

To discard all the squandered information from s??? we use Forget gate f?.

為了丟棄s???中所有浪費(fèi)的信息,我們使用“忘記門” f?。

Selective Forget選擇性忘記

Following the above-mentioned tradition, we will introduce forget gate f? that will constitute a value ranging from 0 to 1 which will be used to determine the importance of each input.

遵循上述傳統(tǒng),我們將引入遺忘門f? ,它將構(gòu)成一個(gè)介于0到1之間的值,該值將用于確定每個(gè)輸入的重要性。

To compute f? we must learn it and the only vectors that we have control on are our parameters. So, to continue computation, we need to express f? in form of provided parameters.

要計(jì)算f?,我們必須學(xué)習(xí)它,而我們唯一可以控制的向量就是我們的參數(shù)。 因此,要繼續(xù)計(jì)算,我們需要以提供的參數(shù)的形式表示f?

After learning Uf, Wf, and Bf using Gradient Descent, we can expect a precise prediction using our forget gate (f?) that is controlling how much information will be discarded.

在使用梯度下降學(xué)習(xí)了Uf,WfBf之后 ,我們可以期望使用我們的遺忘門( f a )進(jìn)行精確的預(yù)測,該遺忘門控制著將丟棄多少信息。

Summing up information from forgetting gate and input gate will impart us about current hidden state information.

從忘記門和輸入門中總結(jié)信息將為我們提供有關(guān)當(dāng)前隱藏狀態(tài)信息的信息。

最終模型 (Final Model)

LSTM modelLSTM模型

The full set of equations looks like:

全套方程看起來像:

The parameters required in LSTM are way more than that required in vanilla RNN.

LSTM中所需的參數(shù)遠(yuǎn)遠(yuǎn)超出了香草RNN中所需的參數(shù)。

Due to the large variation of the number of gates and their arrangements, LSTM can have many types.

由于門數(shù)量及其排列方式的巨大差異,LSTM可以具有多種類型。

GRU(門控循環(huán)單位) (GRUs (Gated Recurrent Units))

As mentioned earlier, LSTM can have many variations and GRU is one of them. Unlikely LSTM, GRU tries to implement fewer gates and thus helps to lower down the computational cost.

如前所述,LSTM可以有多種變體,GRU就是其中之一。 與LSTM不同,GRU嘗試實(shí)現(xiàn)較少的門,從而有助于降低計(jì)算成本。

In Gated Recurrent Units, we have an output gate that controls the proportion of information that will be passed to the next hidden state, besides, we have an input gate that controls information flow from current input and unlike RNN we don’t use forget gates.

在門控循環(huán)單元中,我們有一個(gè)輸出門來控制將傳遞到下一個(gè)隱藏狀態(tài)的信息的比例,此外,我們還有一個(gè)輸入門來控制來自當(dāng)前輸入的信息流,與RNN不同,我們不使用忘記門。

Gated Recurrent Units門控循環(huán)單元

To lower down the computational time we remove forget gate and to discard the information we use compliment of input gate vector i.e. (1-i?).

為了降低計(jì)算時(shí)間,我們刪除了遺忘門,并使用輸入門矢量(1- i? )的補(bǔ)充來丟棄信息。

The equations implemented for GRU are:

為GRU實(shí)施的公式為:

關(guān)鍵點(diǎn) (Key Points)

  • LSTM & GRU are introduced to avoid short-term memory of RNN.

    引入LSTM和GRU是為了避免RNN的短期記憶。
  • LSTM forgets by using Forget Gates.

    LSTM通過使用“忘記門”來忘記。
  • LSTM remembers using Input Gates.

    LSTM記得使用輸入門。
  • LSTM keeps long-term memory using Cell State.

    LSTM使用“單元狀態(tài)”保持長期記憶。
  • GRUs are fast and computationally less expensive than LSTM.

    GRU比LSTM速度快且計(jì)算成本更低。
  • The gradients in LSTM can still vanish in case of forward propagation.

    在向前傳播的情況下,LSTM中的梯度仍會(huì)消失。
  • LSTM doesn’t solve the problem of exploding gradient, therefore we use gradient clipping.

    LSTM不能解決梯度爆炸的問題,因此我們使用梯度削波。

實(shí)際用例 (Practical Use Cases)

  • Sentiment Analysis using RNN

    使用RNN進(jìn)行情感分析
  • AI music generation using LSTM

    使用LSTM生成AI音樂

結(jié)論 (Conclusion)

Hopefully, this article will help you to understand about Long short-term memory(LSTM) and Gated Recurrent Units(GRU) in the best possible way and also assist you to its practical usage.

希望本文能以最佳方式幫助您了解長短期記憶(LSTM)和門控循環(huán)單位(GRU),并幫助您實(shí)際使用。

As always, thank you so much for reading, and please share this article if you found it useful!

與往常一樣,非常感謝您的閱讀,如果發(fā)現(xiàn)有用,請(qǐng)分享這篇文章!

Feel free to connect:

隨時(shí)連接:

LinkedIn ~ https://www.linkedin.com/in/dakshtrehan/

領(lǐng)英(LinkedIn)?https: //www.linkedin.com/in/dakshtrehan/

Instagram ~ https://www.instagram.com/_daksh_trehan_/

Instagram?https: //www.instagram.com/_daksh_trehan_/

Github ~ https://github.com/dakshtrehan

Github?https: //github.com/dakshtrehan

Follow for further Machine Learning/ Deep Learning blogs.

關(guān)注更多機(jī)器學(xué)習(xí)/深度學(xué)習(xí)博客。

Medium ~ https://medium.com/@dakshtrehan

中?https ://medium.com/@dakshtrehan

想了解更多? (Want to learn more?)

Detecting COVID-19 Using Deep Learning

使用深度學(xué)習(xí)檢測COVID-19

The Inescapable AI Algorithm: TikTok

不可避免的AI算法:TikTok

An insider’s guide to Cartoonization using Machine Learning

使用機(jī)器學(xué)習(xí)進(jìn)行卡通化的內(nèi)部指南

Why are YOU responsible for George Floyd’s Murder and Delhi Communal Riots?

您為什么要為喬治·弗洛伊德(George Floyd)的謀殺和德里公社暴動(dòng)負(fù)責(zé)?

Recurrent Neural Network for Dummies

遞歸神經(jīng)網(wǎng)絡(luò)

Convolution Neural Network for Dummies

卷積神經(jīng)網(wǎng)絡(luò)

Diving Deep into Deep Learning

深入學(xué)習(xí)

Why Choose Random Forest and Not Decision Trees

為什么選擇隨機(jī)森林而不是決策樹

Clustering: What it is? When to use it?

聚類:是什么? 什么時(shí)候使用?

Start off your ML Journey with k-Nearest Neighbors

從k最近鄰居開始您的ML旅程

Naive Bayes Explained

樸素貝葉斯解釋

Activation Functions Explained

激活功能介紹

Parameter Optimization Explained

參數(shù)優(yōu)化說明

Gradient Descent Explained

梯度下降解釋

Logistic Regression Explained

邏輯回歸解釋

Linear Regression Explained

線性回歸解釋

Determining Perfect Fit for your ML Model

確定最適合您的ML模型

Cheers!

干杯!

翻譯自: https://medium.com/towards-artificial-intelligence/understanding-lstms-and-gru-s-b69749acaa35

lstm和gru

總結(jié)

以上是生活随笔為你收集整理的了解LSTM和GRU的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。