使用TensorFlow训练神经网络进行价格预测
Using Deep Neural Networks for regression problems might seem like overkill (and quite often is), but for some cases where you have a significant amount of high dimensional data they can outperform any other ML models.
使用深度神經(jīng)網(wǎng)絡(luò)解決回歸問(wèn)題似乎過(guò)大(并且經(jīng)常是),但是在某些情況下,如果您擁有大量的高維數(shù)據(jù),它們可能會(huì)勝過(guò)其他任何ML模型。
When you learn about Neural Networks you usually start with some image classification problem like the MNIST dataset — this is an obvious choice as advanced tasks with high dimensional data is where DNNs really thrive.
當(dāng)您了解神經(jīng)網(wǎng)絡(luò)時(shí),通常會(huì)遇到一些圖像分類問(wèn)題,例如MNIST數(shù)據(jù)集-這是一個(gè)顯而易見(jiàn)的選擇,因?yàn)榫哂懈呔S數(shù)據(jù)的高級(jí)任務(wù)是DNN真正蓬勃發(fā)展的地方。
Surprisingly, when you try to apply what you learned on MNIST on a regression tasks you might struggle for a while before your super-advanced DNN model is any better than a basic Random Forest Regressor. Sometimes you might never reach that moment…
令人驚訝的是,當(dāng)您嘗試將在MNIST上學(xué)到的知識(shí)應(yīng)用到回歸任務(wù)上時(shí),您可能需要花一陣子才能使超高級(jí)DNN模型比基本的Random Forest Regressor更好。 有時(shí)您可能永遠(yuǎn)都無(wú)法到達(dá)那一刻……
In this guide, I listed some key tips and tricks learned while using DNN for regression problems. The data is a set of nearly 50 features describing 25k properties in Warsaw. I described the feature selection process in my previous article: feature-selection-and-error-analysis-while-working-with-spatial-data so now we will focus on creating the best possible model predicting property price per m2 using the selected features.
在本指南中,我列出了使用DNN解決回歸問(wèn)題時(shí)學(xué)到的一些關(guān)鍵技巧。 該數(shù)據(jù)是一組近50個(gè)要素的集合,描述了華沙的25k屬性。 我在上一篇文章中介紹了特征選擇過(guò)程:在處理空間數(shù)據(jù)時(shí)進(jìn)行特征選擇和錯(cuò)誤分析,因此現(xiàn)在我們將著重于使用選定的特征創(chuàng)建最佳的模型來(lái)預(yù)測(cè)每平方米的房地產(chǎn)價(jià)格。
The code and data source used for this article can be found on GitHub.
本文使用的代碼和數(shù)據(jù)源可在 GitHub 上 找到 。
1.入門 (1. Getting started)
When training a Deep Neural Network I usually follow these key steps:
在訓(xùn)練深度神經(jīng)網(wǎng)絡(luò)時(shí),我通常遵循以下關(guān)鍵步驟:
A) Choose a default architecture — no. of layers, no. of neurons, activation
A)選擇默認(rèn)架構(gòu)-不。 層數(shù) 的神經(jīng)元,激活
B) Regularize model
B)正則化模型
C) Adjust network architecture
C)調(diào)整網(wǎng)絡(luò)架構(gòu)
D) Adjust the learning rate and no. of epochs
D)調(diào)整學(xué)習(xí)率和否。 時(shí)代
E) Extract optimal model using CallBacks
E)使用回調(diào)提取最佳模型
Usually creating the final model takes a few runs through all of these steps but an important thing to remember is: DO ONE THING AT A TIME. Don’t try to change architecture, regularization, and learning rate at the same time as you will not know what really worked and probably spend hours going in circles.
通常,創(chuàng)建最終模型需要完成所有這些步驟,但是要記住的重要一件事是: 一次做一件事。 不要試圖同時(shí)更改體系結(jié)構(gòu),正則化和學(xué)習(xí)率,因?yàn)槟鷮⒉恢勒嬲行У姆椒?#xff0c;并且可能會(huì)花費(fèi)數(shù)小時(shí)來(lái)討論。
Before you start building any DNNs for regression tasks there are 3 key things you must remember:
在開(kāi)始為回歸任務(wù)構(gòu)建任何DNN之前,您必須記住3個(gè)關(guān)鍵事項(xiàng):
Standarize your data to make training more efficient
標(biāo)準(zhǔn)化您的數(shù)據(jù)以提高培訓(xùn)效率
Use RELU activation function for all hidden layers — you will be going nowhere with default sigmoid activation
對(duì)所有隱藏層使用RELU激活功能-默認(rèn)Sigmoid激活將無(wú)處可走
Use Linear activation function for the single-neuron output layer
對(duì)單神經(jīng)元輸出層使用線性激活函數(shù)
Another important task is selecting the loss function. Mean Squared Error or Mean Absolute Error are the two most common choices. As my goal to minimize the average percentage error and maximize the share of all buildings within 5% error I choose MAE, as it penalizes outliers less and is easier to interpret — it pretty much tells you how many $$/m2 on average each offer is off the actual value.
另一個(gè)重要任務(wù)是選擇損失函數(shù)。 均方誤差或均絕對(duì)誤差是兩個(gè)最常見(jiàn)的選擇。 我的目標(biāo)是最大程度地減少平均百分比誤差并在5%誤差內(nèi)最大化所有建筑物的份額,因此我選擇MAE,因?yàn)樗鼫p少了異常值,并且更易于解釋-它幾乎可以告訴您每個(gè)報(bào)價(jià)平均要多少美元/平方米偏離實(shí)際值。
There is also a function directly linked to my goal — Mean Absolute Percentage Error, but after testing it against MAE I found the training to be less efficient.
還有一個(gè)與我的目標(biāo)直接相關(guān)的功能- 平均絕對(duì)百分比誤差 ,但是在針對(duì)MAE進(jìn)行測(cè)試后,我發(fā)現(xiàn)訓(xùn)練的效率較低。
2.基本的DNN模型 (2. Base DNN model)
We start with a basic network with 5 hidden layers and a decreasing number of neurons in every second layer.
我們從一個(gè)具有5個(gè)隱藏層的基本網(wǎng)絡(luò)開(kāi)始,并且每隔第二層神經(jīng)元的數(shù)量就會(huì)減少。
tf.keras.backend.clear_session()tf.random.set_seed(60)model=keras.models.Sequential([
keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'),
keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'),
keras.layers.Dense(units=256,activation='relu'),
keras.layers.Dense(units=256,activation='relu'),
keras.layers.Dense(units=128,activation='relu'),
keras.layers.Dense(units=1, activation="linear"),],name="Initial_model",)model.summary()
We use Adam optimizer and start with training each model for 200 epochs — this part of the model configuration will be kept constant up to point 7.
我們使用Adam優(yōu)化器,并開(kāi)始訓(xùn)練每個(gè)模型200個(gè)時(shí)期-模型配置的這一部分將保持恒定,直到第7點(diǎn)。
optimizer = keras.optimizers.Adam()model.compile(optimizer=optimizer, warm_start=False,loss='mean_absolute_error')history = model.fit(X_train, y_train,
epochs=200, batch_size=1024,
validation_data=(X_test, y_test),
verbose=1)
初始模型學(xué)習(xí)曲線 (Initial model learning curve)
Initial model learning curve (starting from epoch 10)初始模型學(xué)習(xí)曲線(從紀(jì)元10開(kāi)始)Our first model turned out to be quite a failure, we have horrendous overfitting on Training data and our Validation Loss is actually increasing after epoch 100.
我們的第一個(gè)模型證明是完全失敗的,我們?cè)谟?xùn)練數(shù)據(jù)上存在過(guò)分的過(guò)擬合,并且在第100個(gè)時(shí)期之后,我們的驗(yàn)證損失實(shí)際上正在增加。
3.帶輟學(xué)的正則化 (3. Regularization with Drop-out)
Drop out is probably the best answer to DNN regularization and works with all types of network sizes and architectures. Applying Dropout randomly drops a portion of neurons in a layer in each epoch during training, which forces the remaining neurons to be more versatile — this decreases overfitting as one Neuron can no longer map one specific instance as it will not always be there during training.
輟學(xué)可能是DNN正則化的最佳解決方案,并且適用于所有類型的網(wǎng)絡(luò)規(guī)模和體系結(jié)構(gòu)。 在訓(xùn)練期間,應(yīng)用Dropout隨機(jī)將每個(gè)時(shí)期的一部分神經(jīng)元掉落到一層中,這將迫使其余的神經(jīng)元具有更多的通用性-這減少了過(guò)度擬合,因?yàn)橐粋€(gè)神經(jīng)元無(wú)法再映射一個(gè)特定的實(shí)例,因?yàn)樵谟?xùn)練過(guò)程中它不會(huì)一直存在。
I advise reading the original paper as it describes the idea very well and does not require years of academic experience to understand it — Dropout: A Simple Way to Prevent Neural Networks from Overfitting
我建議您閱讀原始論文,因?yàn)樗芎玫孛枋隽诉@個(gè)想法,不需要多年的學(xué)術(shù)經(jīng)驗(yàn)就可以理解它- 輟學(xué):一種防止神經(jīng)網(wǎng)絡(luò)過(guò)度擬合的簡(jiǎn)單方法
tf.keras.backend.clear_session()tf.random.set_seed(60)model=keras.models.Sequential([
keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'), keras.layers.Dropout(0.3),
keras.layers.Dense(512, activation='relu'),
keras.layers.Dropout(0.3),keras.layers.Dense(units=256,activation='relu'), keras.layers.Dropout(0.2),
keras.layers.Dense(units=256,activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=128,activation='relu'),
keras.layers.Dense(units=1, activation="linear"),],name="Dropout",)
The (0.x) after Dropout specifies what share of Neurons you want to drop, which translates into how much you want to regularize. I usually start with dropout around (0.3–0.5) in the largest layer and then reduce its rigidness in deeper layers. The idea behind such approach is that neurons in deeper networks tend to have more specific tasks and therefore dropping too many will increase bias too much.
刪除后的(0.x)指定要?jiǎng)h除的神經(jīng)元份額,即要調(diào)整的神經(jīng)元數(shù)量。 我通常從最大層的落差(0.3–0.5)開(kāi)始,然后在較深層減小其剛度。 這種方法背后的想法是,更深層網(wǎng)絡(luò)中的神經(jīng)元傾向于執(zhí)行更具體的任務(wù),因此,丟棄過(guò)多的神經(jīng)元會(huì)增加偏見(jiàn)。
輟學(xué)模型學(xué)習(xí)曲線 (Dropout model learning curve)
Droput model learning curve (starting from epoch 10)Droput模型學(xué)習(xí)曲線(從紀(jì)元10開(kāi)始)Analyzing learning curve for the modified model we can see that we are going in the right direction. First of all we managed to make progress from the Validation Loss of the previous model (marked by the grey threshold line), secondly, we seem to replace overfitting with a slight underfit.
分析修改后的模型的學(xué)習(xí)曲線,我們可以看到我們朝著正確的方向前進(jìn)。 首先,我們?cè)O(shè)法從先前模型的“驗(yàn)證損失”(由灰色閾值線標(biāo)記)中取得了進(jìn)展,其次,我們似乎用稍微欠擬合代替了過(guò)度擬合。
4.通過(guò)批量歸一化處理即將死亡/爆炸的神經(jīng)元 (4. Tackling dying/exploding neurons with Batch normalization)
When working with several layers with RELU activation we have a significant risk of dying neurons having a negative effect on our performance. This can lead to underfitting we could see in the previous model as we might actually not be using a large share of our neurons, which basically reduced their outputs to 0.
當(dāng)使用RELU激活的多個(gè)層進(jìn)行工作時(shí),我們將面臨死亡神經(jīng)元的巨大風(fēng)險(xiǎn),這會(huì)對(duì)我們的表現(xiàn)產(chǎn)生負(fù)面影響。 這可能會(huì)導(dǎo)致我們?cè)谙惹澳P椭锌吹降臄M合不足,因?yàn)槲覀儗?shí)際上可能沒(méi)有使用大量的神經(jīng)元,這實(shí)際上將它們的輸出降低為0。
Batch Normalization is one of the best ways to handle this issue — when applied we normalize activation outputs of each layer for each batch to reduce the effect of extreme activations on parameter training, which in turn reduces the risk of vanishing/exploding gradients. The original paper describing the solution is more complicated to read than the previous one referenced but I would still suggest giving it a try — Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
批次歸一化是處理此問(wèn)題的最佳方法之一-應(yīng)用后,我們將為每個(gè)批次歸一化每一層的激活輸出,以減少極端激活對(duì)參數(shù)訓(xùn)練的影響,從而降低了消失/爆炸梯度的風(fēng)險(xiǎn)。 描述該解決方案的原始論文比之前的參考文獻(xiàn)更難閱讀,但我仍然建議您嘗試一下— 批量歸一化:通過(guò)減少內(nèi)部協(xié)變量偏移來(lái)加速深度網(wǎng)絡(luò)訓(xùn)練
tf.keras.backend.clear_session()tf.random.set_seed(60)model=keras.models.Sequential([
keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),
keras.layers.Dense(512, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),keras.layers.Dense(units=256,activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=256,activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=128,activation='relu'),
keras.layers.Dense(units=1, activation="linear"),],name="Batchnorm",)
BatchNorm模型學(xué)習(xí)曲線 (BatchNorm model learning curve)
BatchNorm model learning curve (starting from epoch 10)BatchNorm模型學(xué)習(xí)曲線(從紀(jì)元10開(kāi)始)Adding Batch Normalization helped us to bring some of the neurons back to life, which increased our model variance changing underfitting to slight overfitting — training neural networks is often a game of cat and mouse, balancing between optimal bias and variance.
添加“批量歸一化”可以幫助我們使一些神經(jīng)元恢復(fù)活力,從而增加了模型方差的變化,從不完全擬合到輕微過(guò)度擬合-訓(xùn)練神經(jīng)網(wǎng)絡(luò)通常是貓和老鼠的游戲,在最佳偏差和方差之間取得平衡。
Another good news is that we still are improving in terms of a validation error.
另一個(gè)好消息是,我們?nèi)匀辉隍?yàn)證錯(cuò)誤方面進(jìn)行改進(jìn)。
5.將激活功能更改為泄漏的RELU (5. Changing activation function to Leaky RELU)
Leaky RELU activation function is a slight modification of RELU function, which allows some negative activations to leak through, further reducing the risk of dying neurons. Leaky RELU usually takes longer to train, which is why we will train this model for another 100 epochs.
泄漏的RELU激活功能是對(duì)RELU功能的輕微修改,可以使某些負(fù)向激活功能泄漏出去,從而進(jìn)一步降低了神經(jīng)元死亡的風(fēng)險(xiǎn)。 泄漏的RELU通常需要更長(zhǎng)的時(shí)間來(lái)訓(xùn)練,這就是為什么我們將這個(gè)模型再訓(xùn)練100個(gè)時(shí)期。
Leaky RELU activation泄漏的RELU激活 tf.keras.backend.clear_session()tf.random.set_seed(60)model=keras.models.Sequential([
keras.layers.Dense(512, input_dim = X_train.shape[1]),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),
keras.layers.Dense(512),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=128),
keras.layers.LeakyReLU(),
keras.layers.Dense(units=1, activation="linear"),],name="LeakyRELU",)
泄漏的ReLU模型學(xué)習(xí)曲線 (Leaky ReLU model learning curve)
Leaky ReLU model learning curve (starting from epoch 10)泄漏的ReLU模型學(xué)習(xí)曲線(從第10紀(jì)開(kāi)始)It seems Leaky RELU reduced the overfitting and gave us a healthier learning curve, where we can see the potential for improvement even after 300 epochs. We nearly reached the lowest error from previous model, but we managed to do that without overfitting, which leaves us space for increasing variance.
Leaky RELU似乎減少了過(guò)擬合,并為我們提供了更健康的學(xué)習(xí)曲線,即使在300個(gè)時(shí)代之后,我們?nèi)钥梢钥吹礁倪M(jìn)的潛力。 我們幾乎達(dá)到了先前模型中的最低誤差,但是我們?cè)O(shè)法做到了這一點(diǎn)而沒(méi)有過(guò)度擬合,這為我們留出了增加差異的空間。
6.通過(guò)具有1024個(gè)神經(jīng)元的附加隱藏層擴(kuò)展網(wǎng)絡(luò) (6. Expanding network with an additional hidden layer with 1024 neurons)
At this point, I am happy enough with the basic model to make the network larger by adding another hidden layer with 1024 neurons. The new layer also has the highest dropout rate. I also experimented with dropout rates for lower levels due to change in the overall architecture.
在這一點(diǎn)上,我對(duì)基本模型很滿意,可以通過(guò)添加具有1024個(gè)神經(jīng)元的另一個(gè)隱藏層來(lái)擴(kuò)大網(wǎng)絡(luò)。 新層的輟學(xué)率也最高。 由于整體架構(gòu)的變化,我還嘗試了較低級(jí)別的輟學(xué)率。
tf.keras.backend.clear_session()tf.random.set_seed(60)model=keras.models.Sequential([
keras.layers.Dense(1024, input_dim = X_train.shape[1]),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.4),
keras.layers.Dense(512),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),keras.layers.Dense(512),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),
keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),
keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.01),keras.layers.Dense(units=128),
keras.layers.LeakyReLU(),
keras.layers.Dropout(0.05),
keras.layers.Dense(units=1, activation="linear"),],name="Larger_network",)
更大的網(wǎng)絡(luò)模型學(xué)習(xí)曲線 (Larger network model learning curve)
Larger network model learning curve (starting from epoch 10)更大的網(wǎng)絡(luò)模型學(xué)習(xí)曲線(從紀(jì)元10開(kāi)始)Expanding network architecture seems to be going in the right direction, we increased variance slightly getting learning curve, which is close to optimal balance. We also managed to get our Validation Loss nearly on par with the overfitted BatchNorm model.
擴(kuò)展網(wǎng)絡(luò)體系結(jié)構(gòu)似乎朝著正確的方向發(fā)展,我們略微增加了方差,獲得了學(xué)習(xí)曲線,接近最佳平衡。 我們還設(shè)法使驗(yàn)證損失幾乎與過(guò)度擬合的BatchNorm模型相當(dāng)。
7.通過(guò)學(xué)習(xí)率衰減提高培訓(xùn)效率 (7. Improved training efficiency with Learning Rate Decay)
Once we are happy with the network architecture, Learning Rate is the most important hyperparameter, which needs tuning. I decided to use learning rate decay, which allows me to train my model faster at the beginning and then decrease the learning rate with further epochs to make training more precise.
一旦我們對(duì)網(wǎng)絡(luò)體系結(jié)構(gòu)感到滿意,學(xué)習(xí)率就是最重要的超參數(shù),需要進(jìn)行調(diào)整。 我決定使用學(xué)習(xí)率衰減,這使我可以在一開(kāi)始就更快地訓(xùn)練我的模型,然后再降低學(xué)習(xí)率,以使訓(xùn)練更加精確。
optimizer = keras.optimizers.Adam(lr=0.005, decay=5e-4)Selecting the right starting rate and decay can be challenging and takes some trial and error. In my case it turned out that the default Adam learning rate in Keras, which is 0.001 was a bit high. I started with a Learning rate of 0.005 and over 400 epochs decreased it to 0.001.
選擇正確的起始速率和衰減可能會(huì)很困難,并且需要反復(fù)試驗(yàn)。 在我的案例中,事實(shí)證明,Keras中默認(rèn)的Adam學(xué)習(xí)速率為0.001,有點(diǎn)高。 我以0.005的學(xué)習(xí)率開(kāi)始,超過(guò)400個(gè)時(shí)期將其降低到0.001。
Learning rate decay over 400 epochs學(xué)習(xí)率下降超過(guò)400個(gè)時(shí)代學(xué)習(xí)率衰減模型學(xué)習(xí)曲線 (Learning rate decay model learning curve)
Learning rate decay model learning curve (starting from epoch 10)學(xué)習(xí)率衰減模型學(xué)習(xí)曲線(從時(shí)期10開(kāi)始)Tuning Learning Rate helped us to finally improve our validation error result, while still keeping the learning curve healthy without too much risk of overfitting — there might even be some space for training the model for another 100 epochs.
調(diào)整學(xué)習(xí)速度可幫助我們最終改善驗(yàn)證錯(cuò)誤結(jié)果,同時(shí)仍保持學(xué)習(xí)曲線健康,而又不會(huì)存在過(guò)度擬合的風(fēng)險(xiǎn)-甚至可能還有空間可以訓(xùn)練模型另外100個(gè)時(shí)期。
8.使用回調(diào)在最佳時(shí)期停止訓(xùn)練 (8. Stopping the training at best epoch using Callbacks)
The last task remaining before choosing our best model is to use CallBacks to stop training at the optimal epoch. This allows us to retrieve the model at the exact epoch, where we reached minimall error. The big advantage of this solution is that you do not really need to worry if you want to train for 300 or 600 epochs — if your model starts overfitting the Call Back will get you back to the optimal epoch.
在選擇最佳模型之前,剩下的最后一項(xiàng)任務(wù)是使用CallBacks在最佳時(shí)期停止訓(xùn)練。 這使我們能夠在達(dá)到最小誤差的確切時(shí)期檢索模型。 該解決方案的最大優(yōu)點(diǎn)是,您真的不需要擔(dān)心要訓(xùn)練300或600個(gè)時(shí)期-如果您的模型開(kāi)始過(guò)度擬合,Call Back將使您回到最佳時(shí)期。
checkpoint_name = 'Weights\Weights-{epoch:03d}--{val_loss:.5f}.hdf5'checkpoint = ModelCheckpoint(checkpoint_name, monitor='val_loss', verbose = 1, save_best_only = True, mode ='auto')
callbacks_list = [checkpoint]
You need to define your callbacks: checkpoint_name specifying where and how you want to save weights for each epoch, checkpoint specifies how the CallBack should behave —I advise monitoring val_loss for improvement and saving only if the epoch made some progress on that.
您需要定義回調(diào):checkpoint_name指定要為每個(gè)紀(jì)元保存權(quán)重的位置和方式,checkpoint指定CallBack的行為方式-我建議監(jiān)視val_loss以進(jìn)行改進(jìn)并僅在紀(jì)元取得了一些進(jìn)展時(shí)進(jìn)行保存。
history = model.fit(X_train, y_train,epochs=500, batch_size=1024,
validation_data=(X_test, y_test),
callbacks=callbacks_list,
verbose=1)
Then all you need to do is to add callbacks while fitting your model.
然后,您要做的就是在擬合模型的同時(shí)添加回調(diào)。
回調(diào)模型學(xué)習(xí)曲線 (Callbacks model learning curve)
Callbacks model learning curve (starting from epoch 10)回調(diào)模型學(xué)習(xí)曲線(從紀(jì)元10開(kāi)始)Using Callbacks allowed us to retrieve the optimal model trained at epoch 468 — the next 30 epochs did not improve as we started to overfit the train set.
使用回調(diào)使我們能夠檢索在468階段訓(xùn)練的最優(yōu)模型-由于我們開(kāi)始過(guò)度擬合訓(xùn)練集,因此接下來(lái)的30個(gè)時(shí)期并沒(méi)有改善。
9.模型演變總結(jié) (9. Model evolution summary)
比較模型之間的驗(yàn)證損失 (Comparing validation loss between models)
It took us 7 steps in order to get to the desired model output. We managed to improve at nearly every step, with a plateau between batch_norm and 1024_layer model, when our key goal was to reduce overfitting. To be honest refining these 7 steps, probably took me 70 steps so bear in mind that training DNNs is an interative process and don’t be put off if your improvement stagnates for a few hours.
為了獲得所需的模型輸出,我們花了7個(gè)步驟。 當(dāng)我們的主要目標(biāo)是減少過(guò)度擬合時(shí),我們?cè)O(shè)法在幾乎每個(gè)步驟上都進(jìn)行了改進(jìn),在batch_norm和1024_layer模型之間保持穩(wěn)定。 老實(shí)說(shuō),細(xì)化這7個(gè)步驟可能要花我70個(gè)步驟,因此請(qǐng)記住,訓(xùn)練DNN是一個(gè)交互過(guò)程,如果您的改進(jìn)停滯了幾個(gè)小時(shí),也不要拖延。
10. DNN與隨機(jī)森林 (10. DNN vs Random Forest)
Finally, how did our best DNN perform in comparison to a base Random Forest Regressor trained on the same data in the previous article?
最后,與上一篇文章中基于相同數(shù)據(jù)訓(xùn)練的基本隨機(jī)森林回歸算法相比,我們最好的DNN表現(xiàn)如何?
In two key KPIs our Random Forest scored as follows:
在兩個(gè)關(guān)鍵的KPI中,我們的隨機(jī)森林得分如下:
- Share of forecasts within 5% absolute error = 44.6% 占絕對(duì)誤差5%以內(nèi)的預(yù)測(cè)份額= 44.6%
- Mean percentage error = 8.8% 平均百分比誤差= 8.8%
Our best Deep Neural Network scored:
我們最好的深度神經(jīng)網(wǎng)絡(luò)得分:
- Share of forecasts within 5% absolute error = 43.3% (-1.3 p.p.) 占絕對(duì)誤差5%以內(nèi)的預(yù)測(cè)份額= 43.3%(-1.3 pp)
- Mean percentage error = 9.1% (+0.3 p.p.) 平均百分比誤差= 9.1%(+0.3 pp)
Can we cry now? How is it possible that after hours of meticulous training our advanced neural network did not beat a Random Forest? To be honest there are two key reasons:
我們現(xiàn)在可以哭嗎? 經(jīng)過(guò)數(shù)小時(shí)的精心訓(xùn)練,我們先進(jìn)的神經(jīng)網(wǎng)絡(luò)怎么可能沒(méi)有擊敗隨機(jī)森林? 坦白地說(shuō),有兩個(gè)主要原因:
- A sample size of 25k records is still quite small in terms of training DNNs, I choose to give this architecture a try as I am gathering new data every month and I am confident that within a few months I will reach samples closer to 100k, which should give DNN the needed edge 就訓(xùn)練DNN而言,25k記錄的樣本量仍然很小,我選擇嘗試一下這種架構(gòu),因?yàn)槲颐總€(gè)月都在收集新數(shù)據(jù),并且我相信在幾個(gè)月內(nèi)我將達(dá)到接近100k的樣本。應(yīng)該給DNN所需的優(yōu)勢(shì)
- The Random Forest model was quite overfitted and I am not confident that it would generalize well too other properties, despite high performance on validation set — at this point, I would probably still use the DNN model in production as more reliable. 盡管驗(yàn)證集具有很高的性能,但是Random Forest模型非常適合,并且我不相信它會(huì)很好地推廣其他屬性,這時(shí),我仍然會(huì)在生產(chǎn)中使用DNN模型,因?yàn)樗涌煽俊?
To summarize — I would advise against starting the solving of a regression problem with DNN. Unless you are working with hundreds of k samples on a really complex project, a Random Forest Regressor will usually be much faster to get initial results — if they prove to be promising you can proceed to DNN. Training efficient DNN takes more time and if your data sample is not large enough it might never reach Random Forest performance.
總結(jié)一下-我建議不要開(kāi)始使用DNN解決回歸問(wèn)題。 除非您在一個(gè)非常復(fù)雜的項(xiàng)目中使用數(shù)百個(gè)樣本,否則,Random Forest Regressor通常會(huì)更快地獲得初始結(jié)果-如果它們被證明可以保證您可以繼續(xù)進(jìn)行DNN。 訓(xùn)練有效的DNN需要花費(fèi)更多時(shí)間,并且如果您的數(shù)據(jù)樣本不夠大,則可能永遠(yuǎn)無(wú)法達(dá)到Random Forest性能。
[1]: Nitish Srivastava. (June 14 2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting
[1]:Nitish Srivastava。 (2014年6月14日)。 輟學(xué):防止神經(jīng)網(wǎng)絡(luò)過(guò)度擬合的簡(jiǎn)單方法
[2]: Sergey Ioffe. (Mar 2 2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
[2]:謝爾蓋·艾菲(Sergey Ioffe)。 (2015年3月2日)。 批量歸一化:通過(guò)減少內(nèi)部協(xié)變量偏移來(lái)加速深度網(wǎng)絡(luò)訓(xùn)練
翻譯自: https://towardsdatascience.com/training-neural-networks-for-price-prediction-with-tensorflow-8aafe0c55198
總結(jié)
以上是生活随笔為你收集整理的使用TensorFlow训练神经网络进行价格预测的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 5.5亿元!英飞特拟收购全球照明巨头旗下
- 下一篇: 您应该如何改变数据科学教育