當(dāng)前位置：首頁 > 编程资源 > 综合教程 >内容正文

综合教程

深度学习（dropout）

發(fā)布時間：2024/1/3 综合教程 24 生活家

生活随笔收集整理的這篇文章主要介紹了深度学习（dropout）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

other_techniques_for_regularization

隨手翻譯，略作參考，禁止轉(zhuǎn)載

www.cnblogs.com/santian/p/5457412.html

Dropout:Dropout is a radically different technique for regularization. Unlike L1 and L2 regularization, dropout doesn't rely on modifying the cost function. Instead, in dropout we modify the network itself. Let me describe the basic mechanics of how dropout works, before getting into why it works, and what the results are.

Suppose we're trying to train a network:

Dropout 技術(shù)：Dropout是一個同正則化完全不同的技術(shù)，與L1和L2范式正則化不同。dropout并不會修改代價函數(shù)而是修改深度網(wǎng)絡(luò)本身。在我描述dropout的工作機(jī)制和dropout導(dǎo)致何種結(jié)果前，讓我們假設(shè)我們正在訓(xùn)練如下一個網(wǎng)絡(luò)。

In particular, suppose we have a training inputxxand corresponding desired outputyy. Ordinarily, we'd train by forward-propagatingxxthrough the network, and then backpropagating to determine the contribution to the gradient. With dropout, this process is modified. We start by randomly (and temporarily) deleting half the hidden neurons in the network, while leaving the input and output neurons untouched. After doing this, we'll end up with a network along the following lines. Note that the dropout neurons, i.e., the neurons which have been temporarily deleted, are still ghosted in:

特別的。假設(shè)我們有一個輸入xx并且相關(guān)的輸入yy的訓(xùn)練。通常的我們將首先通過前饋網(wǎng)絡(luò)把xx輸入我們隨機(jī)初始化權(quán)重后的網(wǎng)絡(luò)。然后反向傳播拿到對梯度的影響。也就是根據(jù)誤差，根據(jù)鏈?zhǔn)椒▌t反向拿到對相應(yīng)權(quán)重的偏微分。但是，使用dropout技術(shù)的話。相關(guān)的處理就完全不同了。在開始訓(xùn)練的時候我們隨機(jī)的（臨時）刪除一般的神經(jīng)元。但是輸入層和輸出層不做變動。對深度網(wǎng)絡(luò)dropout后。我們將會得到下圖中這樣類似的網(wǎng)絡(luò)。注意。下圖中的虛線存在的網(wǎng)絡(luò)就是我們臨時刪除的。

We forward-propagate the inputxxthrough the modified network, and then backpropagate the result, also through the modified network. After doing this over a mini-batch of examples, we update the appropriate weights and biases. We then repeat the process, first restoring the dropout neurons, then choosing a new random subset of hidden neurons to delete, estimating the gradient for a different mini-batch, and updating the weights and biases in the network.

我們前向傳播輸入項(xiàng)xx通過修改后的網(wǎng)絡(luò)。然后反向傳播拿到的結(jié)果通過修改后的網(wǎng)絡(luò)。對此昨晚一個樣本化的迷你批次的樣本后。我們更新相應(yīng)的權(quán)重和偏置。這樣重復(fù)迭代處理。首先存儲dropout的神經(jīng)元，然后選擇一個新的隨機(jī)隱層神經(jīng)元的子集去刪除。估計不同樣本批次的梯度。最后更新網(wǎng)絡(luò)的權(quán)重和偏置。

By repeating this process over and over, our network will learn a set of weights and biases. Of course, those weights and biases will have been learnt under conditions in which half the hidden neurons were dropped out. When we actually run the full network that means that twice as many hidden neurons will be active. To compensate for that, we halve the weights outgoing from the hidden neurons.

通過不斷的重復(fù)處理。我們的網(wǎng)絡(luò)將會學(xué)到一系列的權(quán)重和偏置參數(shù)。當(dāng)然這些參數(shù)是在一半的隱層神經(jīng)元被dropped out（臨時刪除的）情況下學(xué)習(xí)到的。當(dāng)我們真正的運(yùn)行整個神經(jīng)網(wǎng)絡(luò)的時候意味著兩倍多的隱層神經(jīng)元將被激活。為了抵消此影響。我將從隱層的權(quán)重輸出減半。

This dropout procedure may seem strange andad hoc. Why would we expect it to help with regularization? To explain what's going on, I'd like you to briefly stop thinking about dropout, and instead imagine training neural networks in the standard way (no dropout). In particular, imagine we train several different neural networks, all using the same training data. Of course, the networks may not start out identical, and as a result after training they may sometimes give different results. When that happens we could use some kind of averaging or voting scheme to decide which output to accept. For instance, if we have trained five networks, and three of them are classifying a digit as a "3", then it probably really is a "3". The other two networks are probably just making a mistake. This kind of averaging scheme is often found to be a powerful (though expensive) way of reducing overfitting. The reason is that the different networks may overfit in different ways, and averaging may help eliminate that kind of overfitting.

dropout處理看起來是奇怪并且沒有規(guī)律的。為什么我們希望他對正則化有幫助呢。來解釋dropout到底發(fā)生了什么。我們先不要思考dropout技術(shù)。而是想象我們用一個正常的方式訓(xùn)練一個神經(jīng)網(wǎng)絡(luò)。特別的。假設(shè)我們訓(xùn)練了幾個完全不同的神經(jīng)網(wǎng)絡(luò)。用的是完全相同的訓(xùn)練數(shù)據(jù)。當(dāng)然了。因?yàn)殡S機(jī)初始化參數(shù)或其他原因。訓(xùn)練得到的結(jié)果也許是不同的。當(dāng)這種情況發(fā)生的時候，我們就可以平均這幾種網(wǎng)絡(luò)的結(jié)果，或者根據(jù)相應(yīng)的規(guī)則決定使用哪一種神經(jīng)網(wǎng)絡(luò)輸出的結(jié)果。例如。如果我們訓(xùn)練了五個網(wǎng)絡(luò)。其中三個分類一個數(shù)字為3，最終的結(jié)果就是他是3的可能性更大一些。其他的兩個網(wǎng)絡(luò)也許有些錯誤。這種平均的架構(gòu)被發(fā)現(xiàn)通常是十分有用的來減少過擬合。（當(dāng)然這種訓(xùn)練多個網(wǎng)絡(luò)的代價也是昂貴的。）出現(xiàn)這種結(jié)果的原因就是不同的網(wǎng)絡(luò)也是在不同的方式上過你和。通過平均可以排除掉這種過擬合的。

What's this got to do with dropout? Heuristically, when we dropout different sets of neurons, it's rather like we're training different neural networks. And so the dropout procedure is like averaging the effects of a very large number of different networks. The different networks will overfit in different ways, and so, hopefully, the net effect of dropout will be to reduce overfitting.

這種現(xiàn)象與dropout這種技術(shù)有什么作用的。啟發(fā)式的我們發(fā)現(xiàn)。dropout不同設(shè)置的神經(jīng)元和我們訓(xùn)練幾種不同的神經(jīng)網(wǎng)絡(luò)很像。因此，dropout處理很像是平均一個大量不同網(wǎng)絡(luò)的平均結(jié)果。不同的網(wǎng)絡(luò)在不同的情況下過擬合。因此，很大程度上。dropout將會減少這種過擬合。

A related heuristic explanation for dropout is given in one of the earliest papers to use the technique（**ImageNet Classification with Deep Convolutional Neural Networks, by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton (2012).）: "This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons." In other words, if we think of our network as a model which is making predictions, then we can think of dropout as a way of making sure that the model is robust to the loss of any individual piece of evidence. In this, it's somewhat similar to L1 and L2 regularization, which tend to reduce weights, and thus make the network more robust to losing any individual connection in the network.

一個相關(guān)的早期使用這種技術(shù)的論文（（**ImageNet Classification with Deep Convolutional Neural Networks, by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton (2012).））中啟發(fā)性的dropout解釋是：這種技術(shù)減少了神經(jīng)元之間復(fù)雜的共適性。因?yàn)橐粋€神經(jīng)元不能依賴其他特定的神經(jīng)元。因此，不得不去學(xué)習(xí)隨機(jī)子集神經(jīng)元間的魯棒性的有用連接。換句話說。想象我們的神經(jīng)元作為要給預(yù)測的模型，dropout是一種方式可以確保我們的模型在丟失一個個體線索的情況下保持健壯的模型。在這種情況下，可以說他的作用和L1和L2范式正則化是相同的。都是來減少權(quán)重連接，然后增加網(wǎng)絡(luò)模型在缺失個體連接信息情況下的魯棒性。

Of course, the true measure of dropout is that it has been very successful in improving the performance of neural networks. The original paper(**Improving neural networks by preventing co-adaptation of feature detectorsby Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov (2012). Note that the paper discusses a number of subtleties that I have glossed over in this brief introduction.) introducing the technique applied it to many different tasks. For us, it's of particular interest that they applied dropout to MNIST digit classification, using a vanilla feedforward neural network along lines similar to those we've been considering. The paper noted that the best result anyone had achieved up to that point using such an architecture was98.498.4percent classification accuracy on the test set. They improved that to98.798.7percent accuracy using a combination of dropout and a modified form of L2 regularization. Similarly impressive results have been obtained for many other tasks, including problems in image and speech recognition, and natural language processing. Dropout has been especially useful in training large, deep networks, where the problem of overfitting is often acute.

當(dāng)然，真正使dropout作為一個強(qiáng)大工具的原因是它在提高神經(jīng)網(wǎng)絡(luò)的表現(xiàn)方面是非常成功的。原始的dropout被發(fā)現(xiàn)的論文（）介紹了這種技術(shù)對不同任務(wù)執(zhí)行的結(jié)果。對我們來說。我們對dropout這種技術(shù)對手寫字識別的提升特別感興趣。用一個毫無新意的前饋神經(jīng)網(wǎng)絡(luò)。論文表明最好的結(jié)果實(shí)現(xiàn)的是98.4984的正確率。通過使用dropout和L2范式正則化。正確率提升到了98.7987.同樣顯著的效果也在其他任務(wù)中得到了體現(xiàn)。包括圖像識別，語音識別，自然語言處理。大型深度網(wǎng)絡(luò)過擬合現(xiàn)象很突出。dropout在訓(xùn)練大型的深度網(wǎng)絡(luò)的時候在解決過擬合問題的非常有用。

總結(jié)

以上是生活随笔為你收集整理的深度学习（dropout）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。