在神经网络中使用辍学:不是一个神奇的子弹
Overfitting is an issue that occurs when a model shows high accuracy in predicting training data (the data used to build the model), but low accuracy in predicting test data (unseen data that the model has not used before).
當(dāng)模型在預(yù)測訓(xùn)練數(shù)據(jù)(用于構(gòu)建模型的數(shù)據(jù))中顯示出較高的準(zhǔn)確性,但在預(yù)測測試數(shù)據(jù)(模型之前未使用的未知數(shù)據(jù))中顯示出低準(zhǔn)確性時,就會發(fā)生過擬合問題。
This can particularly be a problem when it comes to using small datasets in the course of building a neural network. It is possible for the neural network to be of such a size that it “overtrains” on the training data — and therefore performs poorly when it comes to predicting new data.
當(dāng)在構(gòu)建神經(jīng)網(wǎng)絡(luò)的過程中使用小型數(shù)據(jù)集時,這尤其可能成為一個問題。 神經(jīng)網(wǎng)絡(luò)的大小可能使它在訓(xùn)練數(shù)據(jù)上“過度訓(xùn)練”,因此在預(yù)測新數(shù)據(jù)時表現(xiàn)不佳。
輟學(xué)在規(guī)范化神經(jīng)網(wǎng)絡(luò)中的作用 (Role of Dropout in Regularizing Neural Networks)
At its most basic, Dropout literally “drops-out” certain neurons from the neural network. This is to prevent excessive “noise” in the network that artificially increases the training accuracy, but does not result in any meaningful information being transferred to the output layer — i.e. any increase in the training accuracy comes from excessive training and not from any useful information from the model features themselves.
從最基本的角度講,Dropout實際上是從神經(jīng)網(wǎng)絡(luò)中“丟棄”某些神經(jīng)元。 這是為了防止網(wǎng)絡(luò)中過分的“噪聲”人為地提高訓(xùn)練精度,但不會導(dǎo)致任何有意義的信息被傳輸?shù)捷敵鰧?#xff0c;即,訓(xùn)練精度的任何提高都來自過度的訓(xùn)練,而不是來自任何有用的信息從模型特征本身。
Dropout renders certain nodes in the network inactive as illustrated in the image at the beginning of this article — thus forcing the network to look for more meaningful patterns that influence the output layer.
如本文開頭的圖像所示,Dropout使網(wǎng)絡(luò)中的某些節(jié)點處于非活動狀態(tài) -從而迫使網(wǎng)絡(luò)尋找影響輸出層的更有意義的模式。
While Dropout can technically be used in both the input and hidden layers — it is most common to use Dropout across the hidden layers, as using it on the input layer still risks discarding important information.
盡管從技術(shù)上講可以在輸入層和隱藏層中都使用Dropout,但最常見的是在隱藏層中使用Dropout,因為在輸入層上使用Dropout仍然有丟棄重要信息的風(fēng)險。
預(yù)測酒店的平均每日房價:基于回歸的神經(jīng)網(wǎng)絡(luò) (Predicting Average Daily Rates For Hotels: Regression-Based Neural Network)
To investigate the effectiveness of Dropout in predicting the output layer, let’s use a regression-based neural network to predict ADR (average daily rates) for customers at a hotel.
為了研究Dropout在預(yù)測輸出層中的有效性,我們使用基于回歸的神經(jīng)網(wǎng)絡(luò)來預(yù)測酒店客戶的ADR(平均每日房價) 。
The original research by Antonio, Almeida, and Nunes (2016) is available in the References section below.
Antonio,Almeida和Nunes(2016)的原始研究可在下面的參考部分中找到。
The following features are used to predict ADR:
以下功能用于預(yù)測ADR:
數(shù)據(jù)集 (Datasets)
Let’s consider two training datasets.
讓我們考慮兩個訓(xùn)練數(shù)據(jù)集。
Dataset 1 is the original dataset with 40,060 observations. Dataset 2 is a smaller version of the original with 100 observations.
數(shù)據(jù)集1是具有40,060個觀測值的原始數(shù)據(jù)集。 數(shù)據(jù)集2是原始版本的較小版本,具有100個觀測值。
A regression-based neural network model is built on each in order to predict ADR values across the test set (a separate dataset). The datasets and code for this example are available in the references section below.
基于回歸的神經(jīng)網(wǎng)絡(luò)模型建立在每個模型上,以便預(yù)測整個測試集(單獨(dú)的數(shù)據(jù)集)的ADR值。 該示例的數(shù)據(jù)集和代碼可在下面的參考部分中找到。
無輟學(xué)的神經(jīng)網(wǎng)絡(luò) (Neural Networks without Dropout)
數(shù)據(jù)集1-模型配置 (Dataset 1 — Model Configuration)
8 input layers are used in the network
網(wǎng)絡(luò)中使用了8個輸入層
ELU is used as the activation function.
ELU用作激活功能。
- A linear output layer is used. 使用線性輸出層。
1,669 hidden nodes are used in the hidden layer.
在隱藏層中使用了1,669個隱藏節(jié)點。
The number of hidden nodes in the layer are determined as follows:
該層中的隱藏節(jié)點數(shù)確定如下:
Source: Image created by author — formula based on answer from Cross Validated來源:作者創(chuàng)建的圖像-基于“交叉驗證”的答案的公式With 30,045 samples in our training set (after partitioning Dataset 1 into training and validation portions), a chosen factor of 2, as well as 8 input neurons and 1 output neuron — this gives 1,669 hidden nodes.
在我們的訓(xùn)練集中有30,045個樣本(將數(shù)據(jù)集1劃分為訓(xùn)練和驗證部分之后),2的選擇因子以及8個輸入神經(jīng)元和1個輸出神經(jīng)元-這提供了1,669個隱藏節(jié)點。
Here is the structure of the neural network:
這是神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu):
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出Using 30 epochs, a batch size of 150, and a validation split of 20%, the model is trained.
使用30個紀(jì)元,批量大小為150個 ,驗證拆分為20%來訓(xùn)練模型。
Here is the training and validation loss:
這是訓(xùn)練和驗證損失:
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出When the predictions are compared to the test set, the following errors are obtained:
將預(yù)測結(jié)果與測試集進(jìn)行比較時,將獲得以下誤差:
Mean Absolute Error: 29.89
平均絕對誤差: 29.89
Root Mean Squared Error: 43.91
均方根誤差: 43.91
數(shù)據(jù)集2-模型配置 (Dataset 2 — Model Configuration)
Using the condensed dataset with only 100 observations, let us now see what the errors look like when using a much smaller dataset.
使用僅包含100個觀測值的壓縮數(shù)據(jù)集,現(xiàn)在讓我們看一下使用小得多的數(shù)據(jù)集時的錯誤情況。
The overall configuration of the network remains the same — but this time a hidden layer with 5 nodes is used — as a dense layer of 1,669 nodes would almost certainly lead to overfitting with such a small training set.
網(wǎng)絡(luò)的總體配置保持不變-但這次使用具有5個節(jié)點的隱藏層-因為1669個節(jié)點的密集層幾乎可以肯定會導(dǎo)致采用如此小的訓(xùn)練集的過擬合。
The network configuration is as follows:
網(wǎng)絡(luò)配置如下:
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出The errors obtained on the test set are as follows:
在測試集上獲得的錯誤如下:
Mean Absolute Error: 39.08
平均絕對錯誤: 39.08
Root Mean Squared Error: 53.59
均方根誤差: 53.59
Clearly, there has been an increase in errors when training on a smaller dataset, which indicates that the model is not performing as well on unseen data. Let’s see what happens when Dropout is introduced.
顯然,在較小的數(shù)據(jù)集上進(jìn)行訓(xùn)練時,錯誤增加了,這表明模型在看不見的數(shù)據(jù)上表現(xiàn)不佳。 讓我們看看引入Dropout時會發(fā)生什么。
具有輟學(xué)的神經(jīng)網(wǎng)絡(luò) (Neural Networks with Dropout)
The same neural network as above is run, but this time using 20% Dropout. In other words, a 20% probability that nodes in the hidden layer will be dropped in order to prevent overfitting.
運(yùn)行與上面相同的神經(jīng)網(wǎng)絡(luò),但是這次使用20%的 Dropout。 換句話說,隱藏層中的節(jié)點將丟失20%的概率,以防止過度擬合。
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出The results obtained are as follows:
獲得的結(jié)果如下:
Mean Absolute Error: 39.96
平均絕對錯誤: 39.96
Root Mean Squared Error: 54.83
均方根誤差: 54.83
We see that this has not had the desired effect of improving accuracy on the test set, and the errors have in fact risen slightly.
我們看到,這并沒有達(dá)到改善測試集準(zhǔn)確性的預(yù)期效果,并且誤差實際上有所增加。
Let’s try 40% Dropout.
讓我們嘗試40%輟學(xué)率。
Mean Absolute Error: 41.97
平均絕對誤差: 41.97
Root Mean Squared Error: 57.23
均方根誤差: 57.23
Again, the errors have increased substantially. This indicates that instead of reducing overfitting — Dropout is eliminating valuable information from the neural network instead which is resulting in lower prediction accuracy.
再次,錯誤已大大增加。 這表明,與其減少過度擬合,不如說是Dropout從神經(jīng)網(wǎng)絡(luò)中消除了有價值的信息,這導(dǎo)致較低的預(yù)測準(zhǔn)確性。
增加隱藏層 (Increasing Hidden Layers)
Instead of using Dropout, what if two hidden layers (5 nodes each) are used instead of one?
如果不使用Dropout,而是使用兩個隱藏層(每個5個節(jié)點)而不是一個隱藏層怎么辦?
Here is the updated model configuration:
這是更新的模型配置:
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出Under this configuration, the reported errors have decreased considerably — on par with those seen when the larger dataset was used:
在這種配置下,報告的錯誤已大大減少-與使用較大數(shù)據(jù)集時看到的錯誤相當(dāng):
Mean Absolute Error: 29.06
平均絕對錯誤: 29.06
Root Mean Squared Error: 43.42
均方根誤差: 43.42
選擇正確的功能后,丟包會變得多余 (With Proper Feature Selection, Dropout Can Become Redundant)
Why has Dropout not worked as we intended in this case?
為什么在這種情況下Dropout無法按預(yù)期工作?
One important thing to remember about this neural network is that the features for the input layer were selected before fitting the neural network.
關(guān)于該神經(jīng)網(wǎng)絡(luò)要記住的重要一件事是, 在擬合神經(jīng)網(wǎng)絡(luò)之前選擇了輸入層的特征。
This was done using feature selection tools such as the ExtraTreesClassifier and forward and backward feature selection — as well as manually determining if the included features make theoretical sense in predicting ADR values.
這是通過使用功能選擇工具(例如ExtraTreesClassifier和向前和向后的功能選擇 )以及手動確定所包含的功能在預(yù)測ADR值上是否具有理論意義來完成的。
In this regard, one can make the argument that with proper feature selection — Dropout serves little purpose and instead may simply result in eliminating valuable information from the network.
在這方面,人們可以提出這樣的論點:選擇適當(dāng)?shù)墓δ?輟學(xué)沒有多大作用,而可能只是導(dǎo)致從網(wǎng)絡(luò)中消除有價值的信息。
In this case, adding another hidden layer to the smaller network appears to have been sufficient in accounting for the additional variation in the output layer.
在這種情況下,將另一個隱藏層添加到較小的網(wǎng)絡(luò)似乎足以解決輸出層中的其他變化。
While Dropout can be of use if there are many irrelevant features in the input layer — proper feature selection in the first instance would mean that inducing Dropout in a neural network becomes unnecessary.
盡管在輸入層中有許多不相關(guān)的特征時可以使用Dropout,但首先選擇適當(dāng)?shù)奶卣鲗⒁馕吨鵁o需在神經(jīng)網(wǎng)絡(luò)中引入Dropout。
結(jié)論 (Conclusion)
As we have seen, Dropout did not have the desired effect in improving test accuracy — even in the case of a smaller dataset.
如我們所見,即使在數(shù)據(jù)集較小的情況下,Dropout在提高測試準(zhǔn)確性方面也沒有達(dá)到預(yù)期的效果。
From this standpoint, proper feature selection prior to building a neural network will in most cases prove superior to arbitrarily applying Dropout in order to reduce overfitting. As with any model — ensuring that the variables in such a model make theoretical sense will often produce better results.
從這個角度來看,在大多數(shù)情況下,在構(gòu)建神經(jīng)網(wǎng)絡(luò)之前進(jìn)行適當(dāng)?shù)奶卣鬟x擇將優(yōu)于為減少過度擬合而任意應(yīng)用Dropout的優(yōu)勢。 與任何模型一樣,確保此類模型中的變量具有理論意義,通常會產(chǎn)生更好的結(jié)果。
Many thanks for your time, and grateful for any comments or feedback. The code and datasets for this example is available in the MGCodesandStats GitHub repository as referenced below.
非常感謝您的寶貴時間,并感謝您提出任何意見或反饋。 MGCodesandStats GitHub存儲庫中提供了此示例的代碼和數(shù)據(jù)集,如下所示。
Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.
免責(zé)聲明:本文按“原樣”撰寫,不作任何擔(dān)保。 它旨在提供數(shù)據(jù)科學(xué)概念的概述,并且不應(yīng)以任何方式解釋為專業(yè)建議。
翻譯自: https://towardsdatascience.com/using-dropout-with-neural-networks-not-a-magic-bullet-2fc3e4b17898
總結(jié)
以上是生活随笔為你收集整理的在神经网络中使用辍学:不是一个神奇的子弹的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: adobe系列常用的5个软件
- 下一篇: 线程监视器模型_为什么模型验证如此重要,