辍学的名人_辍学效果如此出色的5个观点
輟學(xué)的名人
Dropout works by randomly blocking off a fraction of neurons in a layer during training. Then, during prediction (after training), Dropout does not block any neurons. The results of this practice have been enormously successful — competition-winning networks almost always make Dropout an essential part of the architecture.
輟學(xué)的工作原理是在訓(xùn)練過程中隨機(jī)阻塞一層神經(jīng)元的一部分。 然后,在預(yù)測(cè)期間(訓(xùn)練后),Dropout不會(huì)阻塞任何神經(jīng)元。 這種實(shí)踐的結(jié)果非常成功-贏得比賽的網(wǎng)絡(luò)幾乎總是使Dropout成為體系結(jié)構(gòu)的重要組成部分。
It can be a bit confusing to understand why Dropout works at all. For one, we are essentially inserting randomness into the model, and hence one would expect its predictions would vary widely as certain important nodes are blocked. With such a volatile environment, it is difficult to imagine how any useful information could be propagated. Furthermore, how does a network adapted to such a random environment perform well when the randomness is suddenly eliminated during prediction?
理解為什么Dropout完全起作用可能會(huì)有些混亂。 首先,我們實(shí)質(zhì)上是在模型中插入隨機(jī)性,因此人們可以預(yù)期,由于某些重要節(jié)點(diǎn)被阻止,其預(yù)測(cè)會(huì)發(fā)生很大變化。 在這樣一個(gè)動(dòng)蕩的環(huán)境中,很難想象如何傳播有用的信息。 此外,當(dāng)在預(yù)測(cè)過程中突然消除隨機(jī)性時(shí),適應(yīng)這種隨機(jī)環(huán)境的網(wǎng)絡(luò)如何表現(xiàn)良好?
There are many perspectives to why Dropout works, and while many of them are interconnected and related, understanding them can give a holistic and deep understanding of why the method has been so successful.
對(duì)于Dropout為何有效的方法有很多觀點(diǎn),盡管其中很多是相互聯(lián)系和相關(guān)的,但了解它們可以全面而深入地了解該方法為何如此成功。
Here’s one approach: because the network is trained in an environment where nodes may be randomly blocked, there are two possibilities:
這是一種方法:由于網(wǎng)絡(luò)是在節(jié)點(diǎn)可能被隨機(jī)阻塞的環(huán)境中訓(xùn)練的,因此有兩種可能性:
- The node that is blocked is a ‘bad node’, or a node that does not provide any information. In this case, the network’s other nodes receive a positive signal through backpropagation and are able to learn better in the absence of a negative node. 被阻止的節(jié)點(diǎn)是“壞節(jié)點(diǎn)”,或不提供任何信息的節(jié)點(diǎn)。 在這種情況下,網(wǎng)絡(luò)的其他節(jié)點(diǎn)通過反向傳播接收正信號(hào),并且在不存在負(fù)節(jié)點(diǎn)的情況下能夠更好地學(xué)習(xí)。
- The node that is blocked is a ‘good node’, or a node that provides important information for prediction. In this case, the network must learn a separate representation of the data in other neurons. 被阻止的節(jié)點(diǎn)是“好節(jié)點(diǎn)”,或者是提供重要信息以進(jìn)行預(yù)測(cè)的節(jié)點(diǎn)。 在這種情況下,網(wǎng)絡(luò)必須學(xué)習(xí)其他神經(jīng)元中數(shù)據(jù)的單獨(dú)表示。
In this view of Dropout, no matter what nodes Dropout blocks, the network can benefit from it. This perspective of the method sees it as a disrupter of sorts, an externally introduced source of randomness to stir up accelerated learning.
在Dropout的這種視圖中,無論Dropout阻塞了哪些節(jié)點(diǎn),網(wǎng)絡(luò)都可以從中受益。 這種方法的觀點(diǎn)將其視為某種破壞者,一種從外部引入的隨機(jī)性源,可激發(fā)加速學(xué)習(xí)。
Another perspective of Dropout is as an ensemble. In the often successful Random Forest algorithm, several Decision Trees are trained on randomly selected subsets of the data, a process known as bagging. By incorporating randomness into the model, the variance of the model was actually suppressed. As an intuitive understanding, consider the following data, a sine wave with lots of normally distributed noise:
Dropout的另一種觀點(diǎn)是合奏。 在通常成功的隨機(jī)森林算法中,在隨機(jī)選擇的數(shù)據(jù)子集上訓(xùn)練幾個(gè)決策樹,這一過程稱為裝袋。 通過將隨機(jī)性納入模型,模型的方差實(shí)際上得到了抑制。 作為直觀的理解,請(qǐng)考慮以下數(shù)據(jù),即具有很多正態(tài)分布噪聲的正弦波:
From this data, we take dozens of approximator curves, which randomly select points along the original curve. Then, these approximator curves are aggregated through averages, and the result is a much cleaner curve:
從這些數(shù)據(jù)中,我們獲得了數(shù)十個(gè)近似曲線,它們沿著原始曲線隨機(jī)選擇點(diǎn)。 然后,這些逼近器曲線通過平均值進(jìn)行匯總,結(jié)果是一條更清晰的曲線:
The transparent lines are the approximator curves.透明線是近似曲線。Bagging works well with high-variance data because it is an instance in which it is possible to fight fire with fire (noise with more noise). In this case, by repeatedly randomly selecting parts of the curve, we neglect other data points, which contributes to a lower variance.
套袋可以很好地處理高方差數(shù)據(jù),因?yàn)樵谶@種情況下,有可能發(fā)生火災(zāi)(噪音多,噪音大)。 在這種情況下,通過重復(fù)隨機(jī)選擇曲線的各個(gè)部分,我們忽略了其他數(shù)據(jù)點(diǎn),這有助于降低方差。
The same idea can be applied to Dropout. When there are hundreds or even thousands of signals coming in from the previous layer in deep neural networks, especially towards the beginning of training, there is bound to be lots of variance and perhaps incorrect signals. By randomly selecting subsets of the previous signals and passing them on, Dropout acts as an approximator and leaves a more purified signal for backpropagation.
相同的想法可以應(yīng)用于輟學(xué)。 當(dāng)深層神經(jīng)網(wǎng)絡(luò)中的上一層有數(shù)百甚至數(shù)千個(gè)信號(hào)進(jìn)入時(shí),尤其是在訓(xùn)練開始時(shí),勢(shì)必會(huì)有很多差異,甚至可能是錯(cuò)誤的信號(hào)。 通過隨機(jī)選擇先前信號(hào)的子集并將其傳遞,Dropout充當(dāng)近似器,并留下更純凈的信號(hào)用于反向傳播。
We can take this perspective further. Every time Dropout is re-applied in an iteration, one could argue that a new network is being created. In bagging with, say, Decision Trees, each model has a different architecture, and it is the aggregation of these different feature maps and specialties in subsets of the data that allow for a rich understanding of the entire feature space. The final model is compromised of the learnings of sub-models.
我們可以進(jìn)一步看待這一觀點(diǎn)。 每次在迭代中重新應(yīng)用Dropout時(shí),都會(huì)有人爭(zhēng)辯說正在創(chuàng)建一個(gè)新的網(wǎng)絡(luò)。 例如,在用決策樹打包時(shí),每個(gè)模型都有不同的體系結(jié)構(gòu),正是這些不同的特征圖和數(shù)據(jù)子集中的特殊性的集合,才使人們對(duì)整個(gè)特征空間有了更深入的了解。 最終模型折衷了子模型的知識(shí)。
Each training iteration, a ‘new network’ is created and the weights are updated to reflect the learnings of the new network. Although the method that this is done — more one-dimensional than two-dimensional — is different, it is essentially performing the same task as an ensemble. After enough iterations, the network learns to find so-called ‘universal weights’, or parameters that perform well regardless of changes to the architecture. Like ensembles, Dropout allows for networks to learn from the composition of many more detailed and focused networks.
每次訓(xùn)練迭代時(shí),都會(huì)創(chuàng)建一個(gè)“新網(wǎng)絡(luò)”,并更新權(quán)重以反映新網(wǎng)絡(luò)的學(xué)習(xí)情況。 盡管完成此操作的方法(一維多于二維)是不同的,但實(shí)際上它執(zhí)行的是與合奏相同的任務(wù)。 經(jīng)過足夠的迭代后,網(wǎng)絡(luò)將學(xué)會(huì)尋找所謂的“通用權(quán)重”,即無論架構(gòu)如何變化都可以正常運(yùn)行的參數(shù)。 像合奏一樣,Dropout允許網(wǎng)絡(luò)從許多更詳細(xì)和專注的網(wǎng)絡(luò)組成中學(xué)習(xí)。
Dropout is also seen as a form of regularization, which is a family of methods to prevent neural networks from overfitting. By randomly cutting off part of the signal flowing from one layer to the next, we prevent an overly detailed rush of numbers to the end of the network, which will be met by an equally complex flow of updates through backpropagation.
輟學(xué)也被視為正則化的一種形式,這是防止神經(jīng)網(wǎng)絡(luò)過度擬合的一系列方法。 通過隨機(jī)地切斷從一層流到下一層的部分信號(hào),我們可以防止過多的數(shù)字涌向網(wǎng)絡(luò)的末端,這可以通過反向傳播通過同樣復(fù)雜的更新流程來解決。
Another perspective to Dropout has roots in the overfitting problem, with the fundamental idea that networks overfit because they try to update millions of parameters all at the same time. When neural networks are initialized, their parameters are not accustomed to the dataset and begin exploring the error landscape. When all of this individual exploration is summed in a massive network, it rushes like a tsunami towards backpropagation, and the network rapidly develops and quickly overfits.
Dropout的另一種觀點(diǎn)源于過度擬合問題,其基本思想是網(wǎng)絡(luò)過度擬合,因?yàn)樗鼈冊(cè)噲D同時(shí)更新數(shù)百萬個(gè)參數(shù)。 初始化神經(jīng)網(wǎng)絡(luò)后,它們的參數(shù)將不適應(yīng)數(shù)據(jù)集并開始探索錯(cuò)誤情況。 當(dāng)所有這些單獨(dú)的探索都匯總到一個(gè)龐大的網(wǎng)絡(luò)中時(shí),它像海嘯一樣向后傳播,網(wǎng)絡(luò)Swift發(fā)展并Swift過擬合。
Dropout — especially Dropout implemented extensively through a deep network and with a high fraction of dropped neurons (40 to 50 percent) — lets the network learn in a slower and more gradual format, updating the network part-by-part in a stochastic way.
輟學(xué)-尤其是通過深度網(wǎng)絡(luò)廣泛實(shí)施的輟學(xué),且掉落的神經(jīng)元所占比例很高(40%至50%)-可使網(wǎng)絡(luò)以較慢和漸進(jìn)的格式學(xué)習(xí),以隨機(jī)方式逐部分更新網(wǎng)絡(luò)。
Each new randomly selected portion of the network to be updated must not only update itself but be conscious of the other previously updated parameters. Hence, although it may seem paradoxical, adding randomness helps the model learn in a more controlled format.
網(wǎng)絡(luò)中每個(gè)要更新的隨機(jī)選擇的新部分,不僅必須更新自身,還必須意識(shí)到其他先前更新的參數(shù)。 因此,盡管看似自相矛盾,但增加隨機(jī)性有助于模型以更可控的格式學(xué)習(xí)。
All images created by author.
作者創(chuàng)作的所有圖像。
翻譯自: https://towardsdatascience.com/5-perspectives-to-why-dropout-works-so-well-1c10617b8028
輟學(xué)的名人
總結(jié)
以上是生活随笔為你收集整理的辍学的名人_辍学效果如此出色的5个观点的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 微信:春节期间处置不当营销内容账号 50
- 下一篇: 强化学习-动态规划_强化学习-第5部分