當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

了解LSTM和GRU

發布時間：2023/12/15 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了了解LSTM和GRU 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

lstm和gru

深度學習，自然語言處理 (Deep Learning, Natural Language Processing)

In my last article, I have introduced Recurrent Neural Networks and the complications it carries. To combat the drawbacks we use LSTMs & GRUs.

在上一篇文章中，我介紹了遞歸神經網絡及其帶來的復雜性。為了克服這些缺點，我們使用LSTM和GRU。

障礙，短期記憶 (The obstacle, Short-term Memory)

Recurrent Neural Networks are confined to short-term memory. If a long sequence is fed to the network, they’ll have a hard time remembering the information and might as well leave out important information from the beginning.

遞歸神經網絡僅限于短期記憶。如果將較長的時間饋入網絡，他們將很難記住信息，并且一開始可能會遺漏重要的信息。

Besides, Recurrent Neural Networks faces Vanishing Gradient Problem when backpropagation comes into play. Due to the conflict, the updated gradients are much smaller leaving no change in our model and thus not contributing much in learning.

此外，當反向傳播起作用時，遞歸神經網絡將面臨消失梯度問題。由于存在沖突，更新后的梯度要小得多，因此我們的模型沒有任何變化，因此對學習沒有太大貢獻。

Weight Update Rule體重更新規則

“When we perform backpropagation, we calculate weights and biases for each node. But, if the improvements in the former layers are meager then the adjustment to the current layer will be much smaller. This causes gradients to dramatically diminish and thus leading to almost NULL changes in our model and due to that our model is no longer learning and no longer improving.”

“當我們進行反向傳播時，我們將計算每個節點的權重和偏差。但是，如果前幾層的改進很少，那么對當前層的調整將小得多。這將導致梯度急劇減小，從而導致我們的模型幾乎為NULL更改，并且由于我們的模型不再學習并且不再改進。”

為什么選擇LSTM和GRU？ (Why LSTMs and GRUs?)

Let us say, you’re looking at reviews for Schitt’s Creek online to determine if you could watch it or not. The basic approach will be to read the review and determine its sentiments.

讓我們說，您正在在線查看Schitt's Creek的評論，以確定是否可以觀看。基本方法是閱讀評論并確定其觀點。

When you look for the review, your subconscious mind will try to remember decisive keywords. You will try to remember more weighted words like “Aaaaastonishing”, “timeless”, “incredible”, “eccentric”, and “capricious” and will not focus on regular words like “think”, “exactly”, “most” etc.

當您尋找評論時，您的潛意識將設法記住決定性的關鍵字。您將嘗試記住更重的單詞，例如“ Aaaaastonishing”，“永恒”，“令人難以置信”，“古怪”和“任性”，而不會專注于常規單詞，例如“思考”，“完全”，“多數”等。

The next time you’ll be asked to recall the review, you probably will be having a hard time, but, I bet you must remember the sentiment and few important and decisive words as mentioned above.

下次要求您回顧該評論時，您可能會遇到困難，但是，我敢打賭，您必須記住上述觀點以及很少提及的重要和決定性的詞語。

And that’s what exactly LSTM and GRU are intended to operate.

這正是LSTM和GRU的確切目標。

Learn and Remember only important information and forget every other stuff.

僅學習和記住重要信息，而忘記其他所有內容。

LSTM(長期短期記憶) (LSTM (Long short term memory))

LSTMs are a progressive form of vanilla RNN that were introduced to combat its shortcomings. To implement the above-mentioned intuition and administer the significant information due to finite-sized state vector in RNN we employ selectively read, write, and forget gates.

LSTM是香草RNN的進步形式，旨在克服其缺點。為了實現上述直覺并由于RNN中有限大小的狀態向量而管理重要信息，我們采用選擇性的讀，寫和遺忘門。

Source來源

The abstract concept revolves around cell states and various gates. The cell state can transfer relative information to the sequence chain and is capable of carrying relevant information throughout the computation thus solving the problem of Short-term memory. As the process continues, the more relevant information is added and removed via gates. Gates are special types of neural networks that learn about relevant information during training.

抽象概念圍繞著細胞狀態和各種門。單元狀態可以將相對信息傳遞到序列鏈，并且能夠在整個計算過程中攜帶相關信息，從而解決了短期記憶的問題。隨著過程的繼續，將通過門添加和刪除更相關的信息。閘門是特殊類型的神經網絡，可在訓練過程中了解相關信息。

選擇性寫 (Selective Write)

Let us assume, the hidden state (s?), previous hidden state (s???), current input (x?), and bias (b).

讓我們假設隱藏狀態( s? )，先前的隱藏狀態( s??? )，當前輸入( x? )和偏置( b )。

Now, we are accumulating all the outputs from previous state s??? and computing output for current state s?

現在，我們將累加先前狀態s???的所有輸出，并計算當前狀態s output的輸出

Vanilla RNN香草RNN

Using Selective Write, we are interested in only passing on the relevant information to the next state s?. To implement the strategy we could assign a value ranging from 0 to 1 to each input to determine how much information is to be passed on to the next hidden state.

使用選擇性寫入，我們只希望將相關信息傳遞到下一個狀態s?。 為了實現該策略，我們可以為每個輸入分配一個介于0到1之間的值，以確定將多少信息傳遞給下一個隱藏狀態。

Selective Write in LSTMLSTM中的選擇性寫入

We can store the fraction of information to be passed on in a vector h??? that can be computed by multiplying previous state vector s??? and o??? that stores the value between 0 and 1 for each input.

我們可以存儲的信息的部分，以在一個矢量h???噸帽子被傳遞可以通過以前的狀態向量s???相乘并o???，存儲0和1之間的值對每個輸入進行計算。

The next issue we encounter is, how to get o????

我們遇到的下一個問題是，如何獲得o???？

To compute o??? we must learn it and the only vectors that we have control on, are our parameters. So, to continue computation, we need to express o??? in the form of parameters.

要計算o???，我們必須學習它，而我們唯一可以控制的向量就是我們的參數。因此，要繼續計算，我們需要以參數的形式表示o??? 。

After learning Uo, Wo, and Bo using Gradient Descent, we can expect a precise prediction using our output gate (o???) that is controlling how much information will be passed to the next gate.

在使用梯度下降學習了Uo，Wo和Bo之后 ，我們可以期望使用輸出門( o??? )進行精確的預測，該輸出門控制將多少信息傳遞到下一個門。

選擇性閱讀 (Selective Read)

After passing the relevant information from the previous gate, we introduce a new hidden state vector ?? (marked in green).

從前一門傳遞了相關信息之后，我們引入了一個新的隱藏狀態向量?? (標記為綠色)。

?? captures all the information from the previous state h??? and the current input x?.

??捕獲先前狀態h???和當前輸入x?的所有信息。

But, our goal is to remove as much unimportant stuff as possible and to continue with our idea we will selectively read from the ?? to construct a new cell stage.

但是，我們的目標是盡可能多地刪除不重要的內容，并繼續我們的想法，我們將選擇性地從??中讀取內容以構建一個新的細胞階段。

Selective Read選擇性閱讀

To store all-important pieces of content, we will again roll back to 0–1 strategy where we will assign a value between 0–1 to each input defining the proportion that we will like to read.

要存儲所有重要的內容，我們將再次回滾到0–1策略，在該策略中，我們將為每個輸入分配介于0–1之間的值，以定義我們希望讀取的比例。

Vector i? will store the proportional value for each input that will later be multiplied with ?? to control the information flowing through current input, which is called the Input Gate.

向量i?將存儲每個輸入的比例值，以后將與??相乘以控制流經當前輸入的信息，這稱為輸入門。

To compute i? we must learn it and the only vectors that we have control on are our parameters. So, to continue computation, we need to express i? in the form of parameters.

要計算i?，我們必須學習它，而我們唯一可以控制的向量就是我們的參數。因此，要繼續計算，我們需要的參數的形式來表達i?。

After learning Ui, Wi, and Bi using Gradient Descent, we can expect a precise prediction using our input gate (i?) that is controlling how much information will be catered to our model.

在使用梯度下降學習Ui，Wi和Bi之后 ，我們可以期望使用輸入門( i? )進行精確的預測，該輸入門將控制將要滿足我們模型的信息量。

Summing up the parameters that were learned till now:

總結到目前為止所學的參數:

有選擇地忘記 (Selectively Forget)

After selectively reading and writing the information, now we are aiming to forget all the irrelevant stuff that could help us to cut the clutter.

在選擇性地讀取和寫入信息之后，現在我們的目標是忘記所有可以幫助我們減少混亂的無關緊要的東西。

To discard all the squandered information from s??? we use Forget gate f?.

為了丟棄s???中所有浪費的信息，我們使用“忘記門” f?。

Selective Forget選擇性忘記

Following the above-mentioned tradition, we will introduce forget gate f? that will constitute a value ranging from 0 to 1 which will be used to determine the importance of each input.

遵循上述傳統，我們將引入遺忘門f? ，它將構成一個介于0到1之間的值，該值將用于確定每個輸入的重要性。

To compute f? we must learn it and the only vectors that we have control on are our parameters. So, to continue computation, we need to express f? in form of provided parameters.

要計算f?，我們必須學習它，而我們唯一可以控制的向量就是我們的參數。因此，要繼續計算，我們需要以提供的參數的形式表示f? 。

After learning Uf, Wf, and Bf using Gradient Descent, we can expect a precise prediction using our forget gate (f?) that is controlling how much information will be discarded.

在使用梯度下降學習了Uf，Wf和Bf之后 ，我們可以期望使用我們的遺忘門( f a )進行精確的預測，該遺忘門控制著將丟棄多少信息。

Summing up information from forgetting gate and input gate will impart us about current hidden state information.

從忘記門和輸入門中總結信息將為我們提供有關當前隱藏狀態信息的信息。

最終模型 (Final Model)

LSTM modelLSTM模型

The full set of equations looks like:

全套方程看起來像:

The parameters required in LSTM are way more than that required in vanilla RNN.

LSTM中所需的參數遠遠超出了香草RNN中所需的參數。

Due to the large variation of the number of gates and their arrangements, LSTM can have many types.

由于門數量及其排列方式的巨大差異，LSTM可以具有多種類型。

GRU(門控循環單位) (GRUs (Gated Recurrent Units))

As mentioned earlier, LSTM can have many variations and GRU is one of them. Unlikely LSTM, GRU tries to implement fewer gates and thus helps to lower down the computational cost.

如前所述，LSTM可以有多種變體，GRU就是其中之一。與LSTM不同，GRU嘗試實現較少的門，從而有助于降低計算成本。

In Gated Recurrent Units, we have an output gate that controls the proportion of information that will be passed to the next hidden state, besides, we have an input gate that controls information flow from current input and unlike RNN we don’t use forget gates.

在門控循環單元中，我們有一個輸出門來控制將傳遞到下一個隱藏狀態的信息的比例，此外，我們還有一個輸入門來控制來自當前輸入的信息流，與RNN不同，我們不使用忘記門。

Gated Recurrent Units門控循環單元

To lower down the computational time we remove forget gate and to discard the information we use compliment of input gate vector i.e. (1-i?).

為了降低計算時間，我們刪除了遺忘門，并使用輸入門矢量(1- i? )的補充來丟棄信息。

The equations implemented for GRU are:

為GRU實施的公式為:

關鍵點 (Key Points)

LSTM & GRU are introduced to avoid short-term memory of RNN.
引入LSTM和GRU是為了避免RNN的短期記憶。
LSTM forgets by using Forget Gates.
LSTM通過使用“忘記門”來忘記。
LSTM remembers using Input Gates.
LSTM記得使用輸入門。
LSTM keeps long-term memory using Cell State.
LSTM使用“單元狀態”保持長期記憶。
GRUs are fast and computationally less expensive than LSTM.
GRU比LSTM速度快且計算成本更低。
The gradients in LSTM can still vanish in case of forward propagation.
在向前傳播的情況下，LSTM中的梯度仍會消失。
LSTM doesn’t solve the problem of exploding gradient, therefore we use gradient clipping.
LSTM不能解決梯度爆炸的問題，因此我們使用梯度削波。

實際用例 (Practical Use Cases)

Sentiment Analysis using RNN
使用RNN進行情感分析

AI music generation using LSTM
使用LSTM生成AI音樂

結論 (Conclusion)

Hopefully, this article will help you to understand about Long short-term memory(LSTM) and Gated Recurrent Units(GRU) in the best possible way and also assist you to its practical usage.

希望本文能以最佳方式幫助您了解長短期記憶(LSTM)和門控循環單位(GRU)，并幫助您實際使用。

As always, thank you so much for reading, and please share this article if you found it useful!

與往常一樣，非常感謝您的閱讀，如果發現有用，請分享這篇文章！

Feel free to connect:

隨時連接:

LinkedIn ~ https://www.linkedin.com/in/dakshtrehan/

領英(LinkedIn)?https: //www.linkedin.com/in/dakshtrehan/

Instagram ~ https://www.instagram.com/_daksh_trehan_/

Instagram?https: //www.instagram.com/_daksh_trehan_/

Github ~ https://github.com/dakshtrehan

Github?https: //github.com/dakshtrehan

Follow for further Machine Learning/ Deep Learning blogs.

關注更多機器學習/深度學習博客。

Medium ~ https://medium.com/@dakshtrehan

中?https ://medium.com/@dakshtrehan

想了解更多？ (Want to learn more?)

Detecting COVID-19 Using Deep Learning

使用深度學習檢測COVID-19

The Inescapable AI Algorithm: TikTok

不可避免的AI算法:TikTok

An insider’s guide to Cartoonization using Machine Learning

使用機器學習進行卡通化的內部指南

Why are YOU responsible for George Floyd’s Murder and Delhi Communal Riots?

您為什么要為喬治·弗洛伊德(George Floyd)的謀殺和德里公社暴動負責？

Recurrent Neural Network for Dummies

遞歸神經網絡

Convolution Neural Network for Dummies

卷積神經網絡

Diving Deep into Deep Learning

深入學習

Why Choose Random Forest and Not Decision Trees

為什么選擇隨機森林而不是決策樹

Clustering: What it is? When to use it?

聚類:是什么？什么時候使用？

Start off your ML Journey with k-Nearest Neighbors

從k最近鄰居開始您的ML旅程

Naive Bayes Explained

樸素貝葉斯解釋

Activation Functions Explained

激活功能介紹

Parameter Optimization Explained

參數優化說明

Gradient Descent Explained

梯度下降解釋

Logistic Regression Explained

邏輯回歸解釋

Linear Regression Explained

線性回歸解釋

Determining Perfect Fit for your ML Model

確定最適合您的ML模型

Cheers!

干杯!

翻譯自: https://medium.com/towards-artificial-intelligence/understanding-lstms-and-gru-s-b69749acaa35