DL:深度学习算法(神经网络模型集合)概览之《THE NEURAL NETWORK ZOO》的中文解释和感悟(一)
DL:深度學習算法(神經網絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(一)
?
?
?
目錄
THE NEURAL NETWORK ZOO
perceptrons
RBF
RNN
LSTM
GRU
BiRNN, BiLSTM and BiGRU
?
?
?
?
相關文章
DL:深度學習算法(神經網絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(一)
DL:深度學習算法(神經網絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(二)
DL:深度學習算法(神經網絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(三)
DL:深度學習算法(神經網絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(四)
DL:深度學習算法(神經網絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(五)
DL:深度學習算法(神經網絡模型集合)概覽之《THE NEURAL NETWORK ZOO》的中文解釋和感悟(六)
?
THE NEURAL NETWORK ZOO
POSTED ON SEPTEMBER 14, 2016 BY FJODOR VAN VEEN
? ? ? ?With new neural network?architectures popping up every now and then, it’s hard to keep track of them all. Knowing all the abbreviations being thrown around (DCIGN, BiLSTM, DCGAN, anyone?) can be a bit overwhelming at first.
? ? ? ?隨著新神經網絡體系結構的不斷涌現,很難對它們進行跟蹤。知道所有的縮寫(DCIGN, BiLSTM, DCGAN,還有其它更多的一些?)一開始可能有點壓倒性。
? ? ? ?So I decided to compose a cheat sheet containing?many of those?architectures. Most of these?are neural networks, some are completely different beasts. Though all of these architectures are presented as novel and unique,?when I drew?the node structures…?their underlying relations started to make more sense.
? ? ? ?因此,我決定編寫一個包含許多這樣的體系結構的備忘單。其中大部分是神經網絡,有些則是完全不同的beasts。雖然所有這些架構都是新穎獨特的,但當我繪制節點結構時……它們的底層關系開始變得更有意義。
? ? ? ?One problem with drawing them as node maps: it doesn’t really show how they’re used. For example, variational autoencoders (VAE) may look just like autoencoders (AE), but the training process is actually quite different. The use-cases for trained networks differ even more, because VAEs are generators, where you insert noise to get a new sample. AEs, simply map whatever they get as input to the closest training sample they “remember”. I should add that this overview is?in no way clarifying how each of the different node types work internally (but that’s a topic for another day).
? ? ? ?將它們繪制為節點映射有一個問題:它并沒有真正顯示如何使用它們。例如,變分自編碼器(VAE)可能看起來就像自編碼器(AE),但是訓練過程實際上是非常不同的。經過訓練的網絡的用例差別甚至更大,因為VAEs是生成器,您可以在其中插入噪聲以獲得新的樣本。AEs,簡單地將他們得到的輸入映射到他們“記得”的最近的訓練樣本。我應該補充一點,這個概述并沒有闡明每種不同的節點類型如何在內部工作(但這是另一個主題)。
? ? ? ?It should be noted that while most of the abbreviations used are generally accepted, not all of them are. RNNs sometimes refer to recursive neural networks, but most of the time they refer to recurrent neural networks. That’s not the end of it though, in many places you’ll find RNN used as placeholder for any recurrent architecture, including LSTMs, GRUs and even the bidirectional variants. AEs suffer from a similar problem from time to time, where VAEs and DAEs and the like are called simply AEs. Many abbreviations also vary in the amount of “N”s to add at the end, because you could call it a convolutional neural network but also simply a convolutional network (resulting in CNN or CN).
? ? ? ?應該指出的是,雖然使用的大多數縮寫詞都被普遍接受,但并不是所有的縮寫詞都被接受。RNNs有時指recursive神經網絡,但大多數時候指的是recurrent神經網絡。不過,這還沒完,在許多地方,您會發現RNN被用作任何循環體系結構的占位符,包括LSTMs、GRUs甚至雙向變體。AEs有時會遇到類似的問題,其中VAEs和DAEs等被簡單地稱為AEs。許多縮寫在結尾添加的“N”的數量也有所不同,因為您可以將其稱為卷積神經網絡,也可以簡單地稱為卷積網絡(即CNN或CN)。
? ? ? ?Composing a complete?list is practically?impossible, as?new architectures are invented all the time. Even if published it can still be quite challenging to find them even if you’re looking for them, or sometimes you just overlook some. So while this list may provide you with some insights into the world of AI, please, by no means take this list for being comprehensive; especially if you read this post long after it was written.
? ? ? ?組成一個完整的列表實際上是不可能的,因為新的體系結構一直在被發明。即使發表了,找到它們仍然是很有挑戰性的,即使你正在尋找它們,或者有時你只是忽略了一些。因此,盡管這份清單可能會讓你對人工智能的世界有一些了解,但請不要認為這份清單是全面的;特別是如果你在這篇文章寫完很久之后才讀它。
? ? ? ?For each of the architectures depicted in the picture, I wrote a?very, very?brief description. You may find some of these to be useful if you’re quite familiar with some architectures, but you aren’t familiar with a particular one.
? ? ? ?對于圖中描述的每一個架構,我都寫了一個非常非常簡短的描述。如果您非常熟悉某些體系結構,您可能會發現其中一些非常有用,但是您不熟悉特定的體系結構。
?
perceptrons
? ? ? ? Feed forward neural networks (FF or FFNN) and perceptrons (P)?are very straight forward, they feed information from the front to the back (input and output, respectively). Neural networks are often described as having layers, where each layer consists of either input, hidden or output cells in parallel. A layer alone never has connections and in general two adjacent layers are fully connected (every neuron form one layer to every neuron to another layer). The simplest somewhat practical network has two input cells and one output cell, which can be used to model logic gates. One usually?trains FFNNs through back-propagation, giving the network paired datasets of “what goes in” and “what we want to have coming?out”. This is called supervised learning, as opposed?to unsupervised learning where we only give it input and let the network fill in the blanks. The error being back-propagated is often some variation of the difference between the input and the output (like MSE or just the linear difference). Given that the network has enough hidden neurons, it can theoretically always model the relationship between the input and output. Practically their use is a lot more limited but they are popularly combined with other networks to form new networks.
? ? ? ? 前饋神經網絡(FF或FFNN)和感知器(P)是非常直接的,它們將信息從前面反饋到后面(分別是輸入和輸出)。神經網絡通常被描述為具有多個層,其中每個層由并行的輸入、隱藏或輸出單元組成。單獨的一層永遠不會有連接,通常相鄰的兩層是完全連接的(每個神經元從一層到另一層)。
?? ? ?最簡單而實用的網絡有兩個輸入單元和一個輸出單元,可以用它們來建模邏輯門。人們通常通過反向傳播來訓練FFNNs,給網絡配對的數據集“輸入什么”和“輸出什么”。這叫做監督學習,而不是我們只給它輸入,讓網絡來填補空白的非監督學習。
?? ? ?反向傳播的錯誤通常是輸入和輸出之間的差異的一些變化(如MSE或只是線性差異)。假設網絡有足夠多的隱藏神經元,理論上它總是可以模擬輸入和輸出之間的關系。實際上,它們的使用非常有限,但是它們通常與其他網絡結合在一起形成新的網絡。
Rosenblatt, Frank. “The perceptron: a probabilistic model for information storage and organization in the brain.” Psychological review 65.6 (1958): 386.
Original Paper PDF
?
RBF
?? ? ?Radial basis function (RBF)?networks are FFNNs with radial basis functions as activation functions. There’s nothing more to it. Doesn’t mean they don’t have their uses, but most FFNNs with other activation functions don’t get their own name. This mostly has to do with inventing them at the right time.
?? ? ?徑向基函數網絡是以徑向基函數為激活函數的神經網絡。沒有別的了。這并不意味著它們沒有自己的用途,但是大多數帶有其他激活函數的FFNNs都沒有自己的名稱。這主要與在正確的時間發明它們有關。
Broomhead, David S., and David Lowe. Radial basis functions, multi-variable functional interpolation and adaptive networks. No. RSRE-MEMO-4148. ROYAL SIGNALS AND RADAR ESTABLISHMENT MALVERN (UNITED KINGDOM), 1988.
Original Paper PDF
?
RNN
? ? ? Recurrent neural networks (RNN)?are FFNNs with a time twist: they are not stateless; they have connections between passes, connections through time. Neurons are fed information not just from the previous layer but also from themselves from the previous pass. This means that the order in which you feed the input and train the network matters: feeding it “milk” and then “cookies” may yield different results compared to feeding it “cookies” and then “milk”. One big problem with RNNs is the vanishing (or exploding) gradient problem where,?depending?on the activation functions used, information rapidly?gets lost over time, just like very deep FFNNs lose information in depth. Intuitively this wouldn’t be much of a problem because these are just weights and not neuron states, but the weights through time is actually where the information from the past is stored; if the weight reaches a value of 0 or 1 000 000, the previous state won’t be very informative. RNNs can?in principle be used?in many fields as most forms of data that don’t actually have a timeline (i.e. unlike sound or?video) can be represented as a sequence. A picture or a string of text can be fed one pixel or character at a time, so the time dependent weights are used for what came before in the sequence, not actually from what happened x seconds before. In general, recurrent networks are a good choice for advancing or completing information, such as?autocompletion.
?? ? ? 遞歸神經網絡(RNN)是具有時間扭曲的FFNNs:它們不是無狀態的;它們之間有聯系,時間上的聯系。神經元不僅接受來自前一層的信息,還接受來自前一層的自身信息。這意味著輸入和訓練網絡的順序很重要:先給它“牛奶”,然后再給它“餅干”,與先給它“餅干”,然后再給它“牛奶”相比,可能會產生不同的結果。
?? ? ?RNNs的一個大問題是消失(或爆炸)梯度問題,根據使用的激活函數,隨著時間的推移,信息迅速丟失,就像非常深的FFNNs在深度上丟失信息一樣。直覺上這不會是個大問題因為這些只是權重而不是神經元的狀態,但是時間的權重實際上是儲存過去信息的地方;如果權重達到0或1,000 000,則前面的狀態不會提供太多信息。
?? ? ?RNNs原則上可以在許多領域中使用,因為大多數沒有時間軸的數據形式(與聲音或視頻不同)都可以表示為序列。一張圖片或一串文本可以一次輸入一個像素或字符,所以時間相關的權重用于序列中之前發生的內容,而不是x秒之前發生的內容。一般來說,遞歸網絡是一個很好的選擇,用于推進或完成信息,如自動完成。
Elman, Jeffrey L. “Finding structure in time.” Cognitive science 14.2 (1990): 179-211.
Original Paper PDF
?
LSTM
?? ? ? Long / short term memory (LSTM)?networks try to combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined memory cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a memory cell and three gates: input, output and forget. The function of these gates is to safeguard?the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a weight to a cell in the previous neuron, so they typically?require more resources to run.
? ? ? ?長/短期內存(LSTM)網絡試圖通過引入門和顯式定義的內存單元來解決漸變消失/爆炸的問題。這些靈感主要來自電路,而不是生物學。
?? ? ??每個神經元都有一個記憶細胞和三個門:輸入、輸出和遺忘。這些門的功能是通過阻止或允許信息流動來保護信息。輸入門決定前一層的信息有多少存儲在單元格中。輸出層接受另一端的任務,并確定下一層對這個單元格的狀態了解多少。“忘記門”一開始看起來很奇怪,但有時候忘記也是有好處的:如果你正在學習一本書,并且翻開了新的一章,那么網絡可能有必要忘記前一章中的一些字符。
?? ? ?LSTMs已經被證明能夠學習復雜的序列,比如像莎士比亞那樣的寫作或者創作原始音樂。注意,這些門中的每一個都對前一個神經元中的一個細胞有一個權重,因此它們通常需要更多的資源來運行。
Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.
Original Paper PDF
?
GRU
?? ? ? Gated recurrent units (GRU)?are a slight variation on LSTMs. They have one less gate and are wired slightly differently: instead of an input, output and a forget gate, they have an update gate. This update gate determines both how much information to keep from the last state and how much information to let in from the previous layer. The reset gate functions much like the forget gate of an LSTM but it’s located slightly differently. They always send out their full state, they don’t have an output gate. In most cases, they function very similarly to LSTMs, with the biggest difference being that GRUs are slightly faster and easier to run (but also?slightly less expressive). In practice these tend to cancel each other out, as you need a bigger network to regain some expressiveness which then in turn cancels out the performance benefits. In some cases where the extra expressiveness is not needed, GRUs can outperform LSTMs.
? ? ? 門控循環單位(GRU)是LSTM的一個微小變化。它們少了一個門,接線方式略有不同:它們沒有輸入、輸出和忘記門,而是有一個更新門。這個更新門決定了與上一個狀態保持多少信息,以及從上一層允許多少信息。
? ? ?重置門的功能與LSTM的忘記門非常相似,但其位置略有不同。它們總是發送它們的完整狀態,它們沒有輸出門。在大多數情況下,它們的功能與lstms非常相似,最大的區別在于grus的速度稍快,運行起來也更容易(但表達能力也稍差)。在實踐中,它們往往會相互抵消,因為您需要一個更大的網絡來重新獲得一些表現力,而這反過來又會抵消性能優勢。在一些不需要額外表現力的情況下,GRUS可以優于LSTM。
Chung, Junyoung, et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014).
Original Paper PDF
?
BiRNN, BiLSTM and BiGRU
? ? ? ?Bidirectional recurrent neural networks, bidirectional long / short term memory networks and bidirectional gated recurrent units (BiRNN, BiLSTM and BiGRU respectively)?are not shown on the chart because they look exactly the same as their unidirectional counterparts. The difference is that these networks are not just connected to the past, but also to the future. As an example, unidirectional LSTMs might be trained to predict the word “fish” by being fed the letters one by one, where the recurrent connections through time remember the last value. A BiLSTM would also be fed the next letter in the sequence on the backward pass, giving it access to future information. This trains the network to fill in gaps instead of advancing information, so instead of expanding an image on the edge, it could fill a hole in the middle of an image.
?? ? ? 圖中沒有顯示雙向遞歸神經網絡、雙向長/短期記憶網絡和雙向門控遞歸單元(BiRNN、BiLSTM和BiGRU),因為它們看起來與單向遞歸單元完全相同。不同之處在于,這些網絡不僅與過去相連,而且與未來相連。例如,單向LSTMs可以通過逐個輸入字母來訓練預測單詞“fish”,其中通過時間的重復連接記住最后一個值。BiLSTM還將在向后傳遞時按順序輸入下一個字母,讓它訪問未來的信息。這就訓練了網絡來填補空白,而不是推進信息,因此它可以填補圖像中間的一個洞,而不是在邊緣擴展圖像。
Schuster, Mike, and Kuldip K. Paliwal. “Bidirectional recurrent neural networks.” IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.
Original Paper PDF
?
?
?
?
?
總結
以上是生活随笔為你收集整理的DL:深度学习算法(神经网络模型集合)概览之《THE NEURAL NETWORK ZOO》的中文解释和感悟(一)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ML之SR:Softmax回归(Soft
- 下一篇: DL:深度学习算法(神经网络模型集合)概