LSTM:《Understanding LSTM Networks》的翻译并解读
LSTM:《Understanding LSTM Networks》的翻譯并解讀
?
?
?
目錄
Understanding LSTM Networks
Recurrent Neural Networks
The Problem of Long-Term Dependencies
LSTM Networks
The Core Idea Behind LSTMs
Step-by-Step LSTM Walk Through
Variants on Long Short Term Memory
Conclusion
Acknowledgments
?
?
Understanding LSTM Networks
Posted on August 27, 2015
原文地址:http://colah.github.io/posts/2015-08-Understanding-LSTMs/
?
Recurrent Neural Networks
| Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence. Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones. Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist. | 人類(lèi)并不是每一秒都能從頭開(kāi)始思考。當(dāng)你閱讀這篇文章的時(shí)候,你是根據(jù)你對(duì)之前的單詞的理解來(lái)理解每一個(gè)單詞的。你不會(huì)把所有東西都扔掉,然后從頭開(kāi)始思考。你的思想有持續(xù)力。 傳統(tǒng)的神經(jīng)網(wǎng)絡(luò)做不到這一點(diǎn),這似乎是一個(gè)主要的缺點(diǎn)。例如,假設(shè)您想要對(duì)電影中每一點(diǎn)發(fā)生的事件進(jìn)行分類(lèi)。目前還不清楚傳統(tǒng)的神經(jīng)網(wǎng)絡(luò)如何利用其對(duì)電影中先前事件的推理來(lái)為后來(lái)的事件提供信息。 遞歸神經(jīng)網(wǎng)絡(luò)解決了這個(gè)問(wèn)題。它們是包含循環(huán)的網(wǎng)絡(luò),允許信息持續(xù)存在。 |
| In the above diagram, a chunk of neural network,?\(A\), looks at some input?\(x_t\)?and outputs a value?\(h_t\). A loop allows information to be passed from one step of the network to the next. | 在上面的圖中,一個(gè)神經(jīng)網(wǎng)絡(luò)塊,查看一些輸入xtxt并輸出一個(gè)值htht。循環(huán)允許信息從網(wǎng)絡(luò)的一個(gè)步驟傳遞到下一個(gè)步驟。 |
| These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop: | 這些循環(huán)使得遞歸神經(jīng)網(wǎng)絡(luò)看起來(lái)有點(diǎn)神秘。然而,如果你多想一下,就會(huì)發(fā)現(xiàn)它們與普通的神經(jīng)網(wǎng)絡(luò)并沒(méi)有太大的不同。一個(gè)遞歸神經(jīng)網(wǎng)絡(luò)可以被認(rèn)為是同一個(gè)網(wǎng)絡(luò)的多個(gè)副本,每個(gè)副本都將一個(gè)消息傳遞給一個(gè)后續(xù)副本。考慮一下如果我們展開(kāi)循環(huán)會(huì)發(fā)生什么: |
| This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data. And they certainly are used! In the last few years, there have been incredible success applying RNNs to a variety of problems: speech recognition, language modeling, translation, image captioning… The list goes on. I’ll leave discussion of the amazing feats one can achieve with RNNs to Andrej Karpathy’s excellent blog post,?The Unreasonable Effectiveness of Recurrent Neural Networks. But they really are pretty amazing. Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which works, for many tasks, much much better than the standard version. Almost all exciting results based on recurrent neural networks are achieved with them. It’s these LSTMs that this essay will explore. | 這種鏈狀的性質(zhì)揭示了遞歸神經(jīng)網(wǎng)絡(luò)與序列和列表密切相關(guān)。它們是神經(jīng)網(wǎng)絡(luò)用來(lái)處理這些數(shù)據(jù)的自然結(jié)構(gòu)。 它們確實(shí)被使用了!在過(guò)去的幾年里,將RNNs應(yīng)用到各種各樣的問(wèn)題上取得了令人難以置信的成功:語(yǔ)音識(shí)別、語(yǔ)言建模、翻譯、圖像字幕等等。我將把關(guān)于使用RNNs可以實(shí)現(xiàn)的驚人壯舉的討論留給Andrej Karpathy的優(yōu)秀博客文章:循環(huán)神經(jīng)網(wǎng)絡(luò)的不合理有效性。但它們真的很神奇。 這些成功的關(guān)鍵是“LSTMs”的使用,這是一種非常特殊的遞歸神經(jīng)網(wǎng)絡(luò),它在很多任務(wù)上都比標(biāo)準(zhǔn)版本好得多。幾乎所有基于遞歸神經(jīng)網(wǎng)絡(luò)的激動(dòng)人心的結(jié)果都是用它們實(shí)現(xiàn)的。本文將探索這些LSTMs。 |
?
The Problem of Long-Term Dependencies??長(zhǎng)期依賴(lài)的問(wèn)題
| One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends. Sometimes, we only need to look at recent information to perform the present task. For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the?sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information. | RNNs的一個(gè)吸引人的地方是,他們可能能夠?qū)⒁郧暗男畔⑴c當(dāng)前的任務(wù)聯(lián)系起來(lái),例如使用以前的視頻幀可能有助于理解當(dāng)前的幀。如果RNNs可以做到這一點(diǎn),它們將是非常有用的。但他們能嗎?視情況而定。 |
| But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent?French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large. | 有時(shí),我們只需要查看最近的信息來(lái)執(zhí)行當(dāng)前的任務(wù)。例如,考慮一個(gè)語(yǔ)言模型,它試圖根據(jù)前面的單詞預(yù)測(cè)下一個(gè)單詞。如果我們?cè)噲D預(yù)測(cè)“天空中的云”中的最后一個(gè)詞,我們不需要任何進(jìn)一步的上下文——很明顯下一個(gè)詞將是天空。在這種情況下,相關(guān)信息和需要信息的地方之間的差距很小,RNNs可以學(xué)習(xí)使用過(guò)去的信息。 但在某些情況下,我們需要更多的上下文。試著預(yù)測(cè)文章的最后一個(gè)詞“我在法國(guó)長(zhǎng)大……我說(shuō)一口流利的法語(yǔ)。”“最近的信息表明,下一個(gè)詞可能是一種語(yǔ)言的名字,但如果我們想縮小范圍,我們需要更早的法國(guó)的背景。相關(guān)信息與需要它變得非常大的點(diǎn)之間的差距是完全可能的。 |
| In theory, RNNs are absolutely capable of handling such “l(fā)ong-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn them. The problem was explored in depth by?Hochreiter (1991) [German]?and?Bengio, et al. (1994), who found some pretty fundamental reasons why it might be difficult. Thankfully, LSTMs don’t have this problem! ? | 從理論上講,RNNs 絕對(duì)有能力處理這種“長(zhǎng)期依賴(lài)”。“人類(lèi)可以為他們仔細(xì)挑選參數(shù)來(lái)解決這種形式的玩具問(wèn)題。遺憾的是,在實(shí)踐中,RNNs似乎不能學(xué)習(xí)它們。Hochreiter(1991)[德語(yǔ)]和Bengio等人(1994)對(duì)這個(gè)問(wèn)題進(jìn)行了深入的探討,他們發(fā)現(xiàn)了一些非常基本的原因,解釋了為什么這個(gè)問(wèn)題可能很難解決。 謝天謝地,lstm沒(méi)有這個(gè)問(wèn)題! |
LSTM Networks
| Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by?Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in following work.1?They work tremendously well on a large variety of problems, and are now widely used. LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn! All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer. | 長(zhǎng)期短期記憶網(wǎng)絡(luò)——通常簡(jiǎn)稱(chēng)為“LSTMs”——是一種特殊的RNN,能夠?qū)W習(xí)長(zhǎng)期依賴(lài)關(guān)系。它們由Hochreiter和Schmidhuber(1997)引入,并在隨后的工作中被許多人提煉和推廣。他們?cè)谠S多問(wèn)題上都做得非常好,現(xiàn)在被廣泛使用。 LSTMs的設(shè)計(jì)是為了避免長(zhǎng)期依賴(lài)問(wèn)題。長(zhǎng)時(shí)間記憶信息實(shí)際上是他們的默認(rèn)行為,而不是他們努力學(xué)習(xí)的東西! 所有的遞歸神經(jīng)網(wǎng)絡(luò)都是由一系列重復(fù)的神經(jīng)網(wǎng)絡(luò)模塊組成的。在標(biāo)準(zhǔn)的RNNs中,這個(gè)重復(fù)的模塊有一個(gè)非常簡(jiǎn)單的結(jié)構(gòu),比如一個(gè)單tanh層。 |
| LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way. | ? LSTMs也有這種類(lèi)似鏈的結(jié)構(gòu),但是重復(fù)模塊有不同的結(jié)構(gòu)。不是只有一個(gè)神經(jīng)網(wǎng)絡(luò)層,而是有四個(gè),它們以一種非常特殊的方式相互作用。? |
| Don’t worry about the details of what’s going on. We’ll walk through the LSTM diagram step by step later. For now, let’s just try to get comfortable with the notation we’ll be using. | 不要擔(dān)心正在發(fā)生的事情的細(xì)節(jié)。稍后,我們將逐步遍歷LSTM圖。現(xiàn)在,讓我們?cè)囍煜ひ幌挛覀儗⒁褂玫姆?hào)。 |
| In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations. | ? ?
|
?
The Core Idea Behind LSTMs?LSTMs背后的核心思想
| The key to LSTMs is the cell state, the horizontal line running through the top of the diagram. The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged. | LSTMs的關(guān)鍵是單元狀態(tài),即貫穿圖頂部的水平線(xiàn)。 細(xì)胞狀態(tài)有點(diǎn)像一個(gè)傳送帶。它沿著整個(gè)鏈向下,只有一些微小的線(xiàn)性相互作用。信息很容易以不變的方式流動(dòng)。? |
| The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates. Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation. | LSTM確實(shí)能夠刪除或向細(xì)胞狀態(tài)添加信息,這是由稱(chēng)為門(mén)的結(jié)構(gòu)仔細(xì)控制的。 門(mén)是一種可選地讓信息通過(guò)的方法。它們由sigmoid神經(jīng)網(wǎng)絡(luò)層和逐點(diǎn)乘法運(yùn)算組成。 ? |
| The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “l(fā)et nothing through,” while a value of one means “l(fā)et everything through!” An LSTM has three of these gates, to protect and control the cell state. | sigmoid層輸出0到1之間的數(shù)字,描述每個(gè)組件應(yīng)該允許通過(guò)的數(shù)量。0的值表示“不讓任何東西通過(guò)”,而1的值表示“讓所有東西通過(guò)!” LSTM有三個(gè)這樣的門(mén)來(lái)保護(hù)和控制單元狀態(tài)。 ? ? |
?
Step-by-Step LSTM Walk Through??分步執(zhí)行LSTM
| The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at?\(h_{t-1}\)?and?\(x_t\), and outputs a number between?\(0\)?and?\(1\)?for each number in the cell state?\(C_{t-1}\). A?\(1\)?represents “completely keep this” while a?\(0\)?represents “completely get rid of this.” | ? LSTM的第一步是決定要從單元狀態(tài)中丟棄什么信息。這個(gè)決定是由一個(gè)叫做“忘記門(mén)”的sigmoid層做出的。“它查看ht?1ht?1和xtxt,并為細(xì)胞狀態(tài)Ct?1Ct?1中的每個(gè)數(shù)輸出一個(gè)00到11之間的數(shù)字。11代表“完全保留這個(gè)”,而00代表“完全擺脫這個(gè)”。” |
| Let’s go back to our example of a language model trying to predict the next word based on all the previous ones. In such a problem, the cell state might include the gender of the present subject, so that the correct pronouns can be used. When we see a new subject, we want to forget the gender of the old subject. | 讓我們回到我們的例子,一個(gè)語(yǔ)言模型試圖預(yù)測(cè)下一個(gè)單詞基于所有前面的詞。在這樣的問(wèn)題中,單元格狀態(tài)可能包括當(dāng)前主體的性別,這樣就可以使用正確的代詞。當(dāng)我們看到一個(gè)新的主題時(shí),我們想要忘記舊主題的性別。 ? |
| The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values,?\(\tilde{C}_t\), that could be added to the state. In the next step, we’ll combine these two to create an update to the state. In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting. | 下一步是決定要在單元狀態(tài)中存儲(chǔ)什么新信息。它有兩部分。首先,一個(gè)名為“輸入門(mén)層”的sigmoid層決定要更新哪些值。接下來(lái),tanh層創(chuàng)建一個(gè)新的候選值向量C~tC~t,可以將其添加到狀態(tài)中。在下一個(gè)步驟中,我們將把這兩者結(jié)合起來(lái)以創(chuàng)建對(duì)狀態(tài)的更新。 在我們的語(yǔ)言模型示例中,我們希望將新主體的性別添加到單元格狀態(tài),以替換我們忘記的舊主體。 ? |
| It’s now time to update the old cell state,?\(C_{t-1}\), into the new cell state?\(C_t\). The previous steps already decided what to do, we just need to actually do it. We multiply the old state by?\(f_t\), forgetting the things we decided to forget earlier. Then we add?\(i_t*\tilde{C}_t\). This is the new candidate values, scaled by how much we decided to update each state value. | 現(xiàn)在是時(shí)候?qū)⑴f的細(xì)胞狀態(tài)Ct?1Ct?1更新為新的細(xì)胞狀態(tài)CtCt了。前面的步驟已經(jīng)決定了要做什么,我們只需要實(shí)際去做。 我們將舊狀態(tài)乘以ft,忘記了我們之前決定忘記的事情。然后我們把它加入到顯示狀態(tài)顯示狀態(tài)C~tit C~t。這是新的候選值,根據(jù)我們決定更新每個(gè)狀態(tài)值的程度進(jìn)行縮放。 ? ? |
| In the case of the language model, this is where we’d actually drop the information about the old subject’s gender and add the new information, as we decided in the previous steps. | 在語(yǔ)言模型中,這是我們實(shí)際刪除關(guān)于舊主題性別的信息并添加新信息的地方,正如我們?cè)谇懊娴牟襟E中所決定的那樣。 |
| Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through?\(\tanh\)?(to push the values to be between?\(-1\)?and?\(1\)) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to. For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows next. | 最后,我們需要決定我們要輸出什么。此輸出將基于我們的單元格狀態(tài),但將是經(jīng)過(guò)篩選的版本。首先,我們運(yùn)行一個(gè)sigmoid層,它決定我們要輸出的單元狀態(tài)的哪些部分。然后,我們將細(xì)胞狀態(tài)放入tanhtanh(將值設(shè)置為?1?1和11之間),并將其乘以s形門(mén)的輸出,這樣我們只輸出我們決定輸出的部分。 對(duì)于語(yǔ)言模型示例,因?yàn)樗皇强吹搅艘粋€(gè)主題,所以它可能希望輸出與動(dòng)詞相關(guān)的信息,以防接下來(lái)會(huì)發(fā)生什么。例如,它可以輸出主語(yǔ)是單數(shù)還是復(fù)數(shù),這樣我們就可以知道一個(gè)動(dòng)詞接下來(lái)應(yīng)該變成什么形式。 |
?
Variants on Long Short Term Memory??LSTM的變體
| What I’ve described so far is a pretty normal LSTM. But not all LSTMs are the same as the above. In fact, it seems like almost every paper involving LSTMs uses a slightly different version. The differences are minor, but it’s worth mentioning some of them. One popular LSTM variant, introduced by?Gers & Schmidhuber (2000), is adding “peephole connections.” This means that we let the gate layers look at the cell state. | 到目前為止,我所描述的是一個(gè)非常普通的LSTM。但并不是所有的lstm都與上述相同。事實(shí)上,似乎幾乎每一篇涉及LSTMs的論文都使用了稍微不同的版本。差異很小,但值得一提。 一種流行的LSTM變體,由Gers和Schmidhuber(2000)引入,增加了“窺視孔連接”。這意味著我們讓柵極層觀察細(xì)胞的狀態(tài)。 |
| The above diagram adds peepholes to all the gates, but many papers will give some peepholes and not others. Another variation is to use coupled forget and input gates. Instead of separately deciding what to forget and what we should add new information to, we make those decisions together. We only forget when we’re going to input something in its place. We only input new values to the state when we forget something older. | 上面的圖表在所有的門(mén)上都加了窺視孔,但是很多論文只會(huì)給出一些窺視孔,而不會(huì)給出其他的。 另一種變化是使用耦合忘記和輸入門(mén)。我們不是單獨(dú)決定忘記什么和應(yīng)該添加什么新信息,而是一起做這些決定。我們只會(huì)忘記什么時(shí)候在它的位置上輸入東西。我們只在忘記舊的值時(shí)才向狀態(tài)輸入新值。 |
| A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU, introduced by?Cho, et al. (2014). It combines the forget and input gates into a single “update gate.” It also merges the cell state and hidden state, and makes some other changes. The resulting model is simpler than standard LSTM models, and has been growing increasingly popular. |
|
| These are only a few of the most notable LSTM variants. There are lots of others, like Depth Gated RNNs by?Yao, et al. (2015). There’s also some completely different approach to tackling long-term dependencies, like Clockwork RNNs by?Koutnik, et al. (2014). Which of these variants is best? Do the differences matter??Greff, et al. (2015)?do a nice comparison of popular variants, finding that they’re all about the same.?Jozefowicz, et al. (2015)?tested more than ten thousand RNN architectures, finding some that worked better than LSTMs on certain tasks. | 這些只是最值得注意的LSTM變體中的幾個(gè)。還有很多其他的,如姚等人(2015)的《深度門(mén)控RNNs》。還有一些完全不同的處理長(zhǎng)期依賴(lài)關(guān)系的方法,如Koutnik等人(2014)的Clockwork RNNs。 這些變體中哪個(gè)是最好的?差異重要嗎?Greff等人(2015)對(duì)流行的變體做了一個(gè)很好的比較,發(fā)現(xiàn)它們都差不多。Jozefowicz等人(2015)測(cè)試了一萬(wàn)多個(gè)RNN架構(gòu),發(fā)現(xiàn)有些架構(gòu)在某些任務(wù)上比LSTMs工作得更好。 |
?
Conclusion
| Earlier, I mentioned the remarkable results people are achieving with RNNs. Essentially all of these are achieved using LSTMs. They really work a lot better for most tasks! Written down as a set of equations, LSTMs look pretty intimidating. Hopefully, walking through them step by step in this essay has made them a bit more approachable. | 之前,我提到了人們使用RNNs所取得的顯著成果。基本上所有這些都是使用LSTMs實(shí)現(xiàn)的。對(duì)于大多數(shù)任務(wù)來(lái)說(shuō),它們確實(shí)工作得更好! 作為一組方程來(lái)寫(xiě),lstm看起來(lái)很?chē)樔恕OM谶@篇文章中一步一步地介紹它們能使它們更容易理解。 |
| LSTMs were a big step in what we can accomplish with RNNs. It’s natural to wonder: is there another big step? A common opinion among researchers is: “Yes! There is a next step and it’s attention!” The idea is to let every step of an RNN pick information to look at from some larger collection of information. For example, if you are using an RNN to create a caption describing an image, it might pick a part of the image to look at for every word it outputs. In fact,?Xu,?et al.?(2015)?do exactly this – it might be a fun starting point if you want to explore attention! There’s been a number of really exciting results using attention, and it seems like a lot more are around the corner… | LSTMs是我們使用RNNs實(shí)現(xiàn)目標(biāo)的一大步。人們很自然地會(huì)想:還會(huì)有更大的進(jìn)步嗎?研究人員普遍認(rèn)為:“是的!下一步就是集中注意力!這個(gè)想法是讓RNN的每一步都從更大的信息集合中挑選信息。例如,如果您使用RNN來(lái)創(chuàng)建描述圖像的標(biāo)題,它可能會(huì)選擇圖像的一部分來(lái)查看它輸出的每個(gè)單詞。事實(shí)上,Xu等人(2015)正是這樣做的——如果你想探索注意力,這可能是一個(gè)有趣的起點(diǎn)!已經(jīng)有很多使用注意力的令人興奮的結(jié)果,而且似乎更多的結(jié)果即將出現(xiàn)…… |
| Attention isn’t the only exciting thread in RNN research. For example, Grid LSTMs by?Kalchbrenner,?et al.?(2015)?seem extremely promising. Work using RNNs in generative models – such as?Gregor,?et al.?(2015),?Chung,?et al.?(2015), or?Bayer & Osendorfer (2015)?– also seems very interesting. The last few years have been an exciting time for recurrent neural networks, and the coming ones promise to only be more so! ? | 注意力并不是RNN研究中唯一令人興奮的線(xiàn)索。例如,Kalchbrenner等人(2015)的Grid LSTMs似乎非常有前途。在生成模型中使用RNNs的工作——如Gregor等人(2015)、Chung等人(2015)或Bayer & Osendorfer等人(2015)——似乎也非常有趣。過(guò)去的幾年對(duì)于遞歸神經(jīng)網(wǎng)絡(luò)來(lái)說(shuō)是激動(dòng)人心的一年,而未來(lái)的幾年將會(huì)更加激動(dòng)人心! ? |
?
Acknowledgments??致謝
| I’m grateful to a number of people for helping me better understand LSTMs, commenting on the visualizations, and providing feedback on this post. | 我非常感謝許多人幫助我更好地理解LSTMs,對(duì)其可視化進(jìn)行了評(píng)論,并對(duì)本文提供了反饋。 |
| I’m very grateful to my colleagues at Google for their helpful feedback, especially?Oriol Vinyals,?Greg Corrado,?Jon Shlens,?Luke Vilnis, and?Ilya Sutskever. I’m also thankful to many other friends and colleagues for taking the time to help me, including?Dario Amodei, and?Jacob Steinhardt. I’m especially thankful to?Kyunghyun Cho?for extremely thoughtful correspondence about my diagrams. Before this post, I practiced explaining LSTMs during two seminar series I taught on neural networks. Thanks to everyone who participated in those for their patience with me, and for their feedback. | 我非常感謝谷歌的同事們提供的有用的反饋,特別是Oriol Vinyals、Greg Corrado、Jon Shlens、Luke Vilnis和Ilya Sutskever。我也感謝許多其他的朋友和同事花時(shí)間來(lái)幫助我,包括達(dá)里奧·阿莫德和雅各布·斯坦哈特。我特別感謝Kyunghyun Cho對(duì)我的圖表所做的極其周到的回復(fù)。 在這篇文章之前,我在兩個(gè)關(guān)于神經(jīng)網(wǎng)絡(luò)的系列研討會(huì)上練習(xí)解釋LSTMs。感謝每一個(gè)參與其中的人,感謝他們對(duì)我的耐心,感謝他們的反饋。 |
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的LSTM:《Understanding LSTM Networks》的翻译并解读的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: DL之MaskR-CNN:基于类Mask
- 下一篇: 成功解决ModuleNotFoundEr