當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

使用TensorFlow训练神经网络进行价格预测

發布時間：2023/12/15 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了使用TensorFlow训练神经网络进行价格预测小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Using Deep Neural Networks for regression problems might seem like overkill (and quite often is), but for some cases where you have a significant amount of high dimensional data they can outperform any other ML models.

使用深度神經網絡解決回歸問題似乎過大(并且經常是)，但是在某些情況下，如果您擁有大量的高維數據，它們可能會勝過其他任何ML模型。

When you learn about Neural Networks you usually start with some image classification problem like the MNIST dataset — this is an obvious choice as advanced tasks with high dimensional data is where DNNs really thrive.

當您了解神經網絡時，通常會遇到一些圖像分類問題，例如MNIST數據集-這是一個顯而易見的選擇，因為具有高維數據的高級任務是DNN真正蓬勃發展的地方。

Surprisingly, when you try to apply what you learned on MNIST on a regression tasks you might struggle for a while before your super-advanced DNN model is any better than a basic Random Forest Regressor. Sometimes you might never reach that moment…

令人驚訝的是，當您嘗試將在MNIST上學到的知識應用到回歸任務上時，您可能需要花一陣子才能使超高級DNN模型比基本的Random Forest Regressor更好。有時您可能永遠都無法到達那一刻……

In this guide, I listed some key tips and tricks learned while using DNN for regression problems. The data is a set of nearly 50 features describing 25k properties in Warsaw. I described the feature selection process in my previous article: feature-selection-and-error-analysis-while-working-with-spatial-data so now we will focus on creating the best possible model predicting property price per m2 using the selected features.

在本指南中，我列出了使用DNN解決回歸問題時學到的一些關鍵技巧。該數據是一組近50個要素的集合，描述了華沙的25k屬性。我在上一篇文章中介紹了特征選擇過程：在處理空間數據時進行特征選擇和錯誤分析，因此現在我們將著重于使用選定的特征創建最佳的模型來預測每平方米的房地產價格。

The code and data source used for this article can be found on GitHub.

本文使用的代碼和數據源可在 GitHub 上找到。

1.入門 (1. Getting started)

When training a Deep Neural Network I usually follow these key steps:

在訓練深度神經網絡時，我通常遵循以下關鍵步驟：

A) Choose a default architecture — no. of layers, no. of neurons, activation
A)選擇默認架構-不。層數的神經元，激活
B) Regularize model
B)正則化模型
C) Adjust network architecture
C)調整網絡架構
D) Adjust the learning rate and no. of epochs
D)調整學習率和否。時代
E) Extract optimal model using CallBacks
E)使用回調提取最佳模型

Usually creating the final model takes a few runs through all of these steps but an important thing to remember is: DO ONE THING AT A TIME. Don’t try to change architecture, regularization, and learning rate at the same time as you will not know what really worked and probably spend hours going in circles.

通常，創建最終模型需要完成所有這些步驟，但是要記住的重要一件事是： 一次做一件事。不要試圖同時更改體系結構，正則化和學習率，因為您將不知道真正有效的方法，并且可能會花費數小時來討論。

Before you start building any DNNs for regression tasks there are 3 key things you must remember:

在開始為回歸任務構建任何DNN之前，您必須記住3個關鍵事項：

Standarize your data to make training more efficient
標準化您的數據以提高培訓效率
Use RELU activation function for all hidden layers — you will be going nowhere with default sigmoid activation
對所有隱藏層使用RELU激活功能-默認Sigmoid激活將無處可走
Use Linear activation function for the single-neuron output layer
對單神經元輸出層使用線性激活函數

Another important task is selecting the loss function. Mean Squared Error or Mean Absolute Error are the two most common choices. As my goal to minimize the average percentage error and maximize the share of all buildings within 5% error I choose MAE, as it penalizes outliers less and is easier to interpret — it pretty much tells you how many $$/m2 on average each offer is off the actual value.

另一個重要任務是選擇損失函數。 均方誤差或均絕對誤差是兩個最常見的選擇。我的目標是最大程度地減少平均百分比誤差并在5％誤差內最大化所有建筑物的份額，因此我選擇MAE，因為它減少了異常值，并且更易于解釋-它幾乎可以告訴您每個報價平均要多少美元/平方米偏離實際值。

There is also a function directly linked to my goal — Mean Absolute Percentage Error, but after testing it against MAE I found the training to be less efficient.

還有一個與我的目標直接相關的功能- 平均絕對百分比誤差 ，但是在針對MAE進行測試后，我發現訓練的效率較低。

2.基本的DNN模型 (2. Base DNN model)

We start with a basic network with 5 hidden layers and a decreasing number of neurons in every second layer.

我們從一個具有5個隱藏層的基本網絡開始，并且每隔第二層神經元的數量就會減少。

tf.keras.backend.clear_session()
tf.random.set_seed(60)model=keras.models.Sequential([

keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'),
keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'),
keras.layers.Dense(units=256,activation='relu'),
keras.layers.Dense(units=256,activation='relu'),
keras.layers.Dense(units=128,activation='relu'),
keras.layers.Dense(units=1, activation="linear"),],name="Initial_model",)model.summary()

We use Adam optimizer and start with training each model for 200 epochs — this part of the model configuration will be kept constant up to point 7.

我們使用Adam優化器，并開始訓練每個模型200個時期-模型配置的這一部分將保持恒定，直到第7點。

optimizer = keras.optimizers.Adam()model.compile(optimizer=optimizer, warm_start=False,
loss='mean_absolute_error')history = model.fit(X_train, y_train,
epochs=200, batch_size=1024,
validation_data=(X_test, y_test),
verbose=1)

初始模型學習曲線 (Initial model learning curve)

Initial model learning curve (starting from epoch 10)初始模型學習曲線(從紀元10開始)

Our first model turned out to be quite a failure, we have horrendous overfitting on Training data and our Validation Loss is actually increasing after epoch 100.

我們的第一個模型證明是完全失敗的，我們在訓練數據上存在過分的過擬合，并且在第100個時期之后，我們的驗證損失實際上正在增加。

3.帶輟學的正則化 (3. Regularization with Drop-out)

Drop out is probably the best answer to DNN regularization and works with all types of network sizes and architectures. Applying Dropout randomly drops a portion of neurons in a layer in each epoch during training, which forces the remaining neurons to be more versatile — this decreases overfitting as one Neuron can no longer map one specific instance as it will not always be there during training.

輟學可能是DNN正則化的最佳解決方案，并且適用于所有類型的網絡規模和體系結構。在訓練期間，應用Dropout隨機將每個時期的一部分神經元掉落到一層中，這將迫使其余的神經元具有更多的通用性-這減少了過度擬合，因為一個神經元無法再映射一個特定的實例，因為在訓練過程中它不會一直存在。

I advise reading the original paper as it describes the idea very well and does not require years of academic experience to understand it — Dropout: A Simple Way to Prevent Neural Networks from Overfitting

我建議您閱讀原始論文，因為它很好地描述了這個想法，不需要多年的學術經驗就可以理解它- 輟學：一種防止神經網絡過度擬合的簡單方法

tf.keras.backend.clear_session()
tf.random.set_seed(60)model=keras.models.Sequential([

keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'), keras.layers.Dropout(0.3),

keras.layers.Dense(512, activation='relu'),
keras.layers.Dropout(0.3),keras.layers.Dense(units=256,activation='relu'), keras.layers.Dropout(0.2),

keras.layers.Dense(units=256,activation='relu'),
keras.layers.Dropout(0.2),

keras.layers.Dense(units=128,activation='relu'),
keras.layers.Dense(units=1, activation="linear"),],name="Dropout",)

The (0.x) after Dropout specifies what share of Neurons you want to drop, which translates into how much you want to regularize. I usually start with dropout around (0.3–0.5) in the largest layer and then reduce its rigidness in deeper layers. The idea behind such approach is that neurons in deeper networks tend to have more specific tasks and therefore dropping too many will increase bias too much.

刪除后的(0.x)指定要刪除的神經元份額，即要調整的神經元數量。我通常從最大層的落差(0.3–0.5)開始，然后在較深層減小其剛度。這種方法背后的想法是，更深層網絡中的神經元傾向于執行更具體的任務，因此，丟棄過多的神經元會增加偏見。

輟學模型學習曲線 (Dropout model learning curve)

Droput model learning curve (starting from epoch 10)Droput模型學習曲線(從紀元10開始)

Analyzing learning curve for the modified model we can see that we are going in the right direction. First of all we managed to make progress from the Validation Loss of the previous model (marked by the grey threshold line), secondly, we seem to replace overfitting with a slight underfit.

分析修改后的模型的學習曲線，我們可以看到我們朝著正確的方向前進。首先，我們設法從先前模型的“驗證損失”(由灰色閾值線標記)中取得了進展，其次，我們似乎用稍微欠擬合代替了過度擬合。

4.通過批量歸一化處理即將死亡/爆炸的神經元 (4. Tackling dying/exploding neurons with Batch normalization)

When working with several layers with RELU activation we have a significant risk of dying neurons having a negative effect on our performance. This can lead to underfitting we could see in the previous model as we might actually not be using a large share of our neurons, which basically reduced their outputs to 0.

當使用RELU激活的多個層進行工作時，我們將面臨死亡神經元的巨大風險，這會對我們的表現產生負面影響。這可能會導致我們在先前模型中看到的擬合不足，因為我們實際上可能沒有使用大量的神經元，這實際上將它們的輸出降低為0。

Batch Normalization is one of the best ways to handle this issue — when applied we normalize activation outputs of each layer for each batch to reduce the effect of extreme activations on parameter training, which in turn reduces the risk of vanishing/exploding gradients. The original paper describing the solution is more complicated to read than the previous one referenced but I would still suggest giving it a try — Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

批次歸一化是處理此問題的最佳方法之一-應用后，我們將為每個批次歸一化每一層的激活輸出，以減少極端激活對參數訓練的影響，從而降低了消失/爆炸梯度的風險。描述該解決方案的原始論文比之前的參考文獻更難閱讀，但我仍然建議您嘗試一下— 批量歸一化：通過減少內部協變量偏移來加速深度網絡訓練

tf.keras.backend.clear_session()
tf.random.set_seed(60)model=keras.models.Sequential([

keras.layers.Dense(512, input_dim = X_train.shape[1], activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),

keras.layers.Dense(512, activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),keras.layers.Dense(units=256,activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),

keras.layers.Dense(units=256,activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),

keras.layers.Dense(units=128,activation='relu'),
keras.layers.Dense(units=1, activation="linear"),],name="Batchnorm",)

BatchNorm模型學習曲線 (BatchNorm model learning curve)

BatchNorm model learning curve (starting from epoch 10)BatchNorm模型學習曲線(從紀元10開始)

Adding Batch Normalization helped us to bring some of the neurons back to life, which increased our model variance changing underfitting to slight overfitting — training neural networks is often a game of cat and mouse, balancing between optimal bias and variance.

添加“批量歸一化”可以幫助我們使一些神經元恢復活力，從而增加了模型方差的變化，從不完全擬合到輕微過度擬合-訓練神經網絡通常是貓和老鼠的游戲，在最佳偏差和方差之間取得平衡。

Another good news is that we still are improving in terms of a validation error.

另一個好消息是，我們仍然在驗證錯誤方面進行改進。

5.將激活功能更改為泄漏的RELU (5. Changing activation function to Leaky RELU)

Leaky RELU activation function is a slight modification of RELU function, which allows some negative activations to leak through, further reducing the risk of dying neurons. Leaky RELU usually takes longer to train, which is why we will train this model for another 100 epochs.

泄漏的RELU激活功能是對RELU功能的輕微修改，可以使某些負向激活功能泄漏出去，從而進一步降低了神經元死亡的風險。泄漏的RELU通常需要更長的時間來訓練，這就是為什么我們將這個模型再訓練100個時期。

Leaky RELU activation泄漏的RELU激活 tf.keras.backend.clear_session()
tf.random.set_seed(60)model=keras.models.Sequential([

keras.layers.Dense(512, input_dim = X_train.shape[1]),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),

keras.layers.Dense(512),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),

keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),

keras.layers.Dense(units=128),
keras.layers.LeakyReLU(),
keras.layers.Dense(units=1, activation="linear"),],name="LeakyRELU",)

泄漏的ReLU模型學習曲線 (Leaky ReLU model learning curve)

Leaky ReLU model learning curve (starting from epoch 10)泄漏的ReLU模型學習曲線(從第10紀開始)

It seems Leaky RELU reduced the overfitting and gave us a healthier learning curve, where we can see the potential for improvement even after 300 epochs. We nearly reached the lowest error from previous model, but we managed to do that without overfitting, which leaves us space for increasing variance.

Leaky RELU似乎減少了過擬合，并為我們提供了更健康的學習曲線，即使在300個時代之后，我們仍可以看到改進的潛力。我們幾乎達到了先前模型中的最低誤差，但是我們設法做到了這一點而沒有過度擬合，這為我們留出了增加差異的空間。

6.通過具有1024個神經元的附加隱藏層擴展網絡 (6. Expanding network with an additional hidden layer with 1024 neurons)

At this point, I am happy enough with the basic model to make the network larger by adding another hidden layer with 1024 neurons. The new layer also has the highest dropout rate. I also experimented with dropout rates for lower levels due to change in the overall architecture.

在這一點上，我對基本模型很滿意，可以通過添加具有1024個神經元的另一個隱藏層來擴大網絡。新層的輟學率也最高。由于整體架構的變化，我還嘗試了較低級別的輟學率。

tf.keras.backend.clear_session()
tf.random.set_seed(60)model=keras.models.Sequential([
keras.layers.Dense(1024, input_dim = X_train.shape[1]),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.4),

keras.layers.Dense(512),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),keras.layers.Dense(512),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.3),

keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.2),

keras.layers.Dense(units=256),
keras.layers.LeakyReLU(),
keras.layers.BatchNormalization(),
keras.layers.Dropout(0.01),keras.layers.Dense(units=128),
keras.layers.LeakyReLU(),
keras.layers.Dropout(0.05),
keras.layers.Dense(units=1, activation="linear"),],name="Larger_network",)

更大的網絡模型學習曲線 (Larger network model learning curve)

Larger network model learning curve (starting from epoch 10)更大的網絡模型學習曲線(從紀元10開始)

Expanding network architecture seems to be going in the right direction, we increased variance slightly getting learning curve, which is close to optimal balance. We also managed to get our Validation Loss nearly on par with the overfitted BatchNorm model.

擴展網絡體系結構似乎朝著正確的方向發展，我們略微增加了方差，獲得了學習曲線，接近最佳平衡。我們還設法使驗證損失幾乎與過度擬合的BatchNorm模型相當。

7.通過學習率衰減提高培訓效率 (7. Improved training efficiency with Learning Rate Decay)

Once we are happy with the network architecture, Learning Rate is the most important hyperparameter, which needs tuning. I decided to use learning rate decay, which allows me to train my model faster at the beginning and then decrease the learning rate with further epochs to make training more precise.

一旦我們對網絡體系結構感到滿意，學習率就是最重要的超參數，需要進行調整。我決定使用學習率衰減，這使我可以在一開始就更快地訓練我的模型，然后再降低學習率，以使訓練更加精確。

optimizer = keras.optimizers.Adam(lr=0.005, decay=5e-4)

Selecting the right starting rate and decay can be challenging and takes some trial and error. In my case it turned out that the default Adam learning rate in Keras, which is 0.001 was a bit high. I started with a Learning rate of 0.005 and over 400 epochs decreased it to 0.001.

選擇正確的起始速率和衰減可能會很困難，并且需要反復試驗。在我的案例中，事實證明，Keras中默認的Adam學習速率為0.001，有點高。我以0.005的學習率開始，超過400個時期將其降低到0.001。

Learning rate decay over 400 epochs學習率下降超過400個時代

學習率衰減模型學習曲線 (Learning rate decay model learning curve)

Learning rate decay model learning curve (starting from epoch 10)學習率衰減模型學習曲線(從時期10開始)

Tuning Learning Rate helped us to finally improve our validation error result, while still keeping the learning curve healthy without too much risk of overfitting — there might even be some space for training the model for another 100 epochs.

調整學習速度可幫助我們最終改善驗證錯誤結果，同時仍保持學習曲線健康，而又不會存在過度擬合的風險-甚至可能還有空間可以訓練模型另外100個時期。

8.使用回調在最佳時期停止訓練 (8. Stopping the training at best epoch using Callbacks)

The last task remaining before choosing our best model is to use CallBacks to stop training at the optimal epoch. This allows us to retrieve the model at the exact epoch, where we reached minimall error. The big advantage of this solution is that you do not really need to worry if you want to train for 300 or 600 epochs — if your model starts overfitting the Call Back will get you back to the optimal epoch.

在選擇最佳模型之前，剩下的最后一項任務是使用CallBacks在最佳時期停止訓練。這使我們能夠在達到最小誤差的確切時期檢索模型。該解決方案的最大優點是，您真的不需要擔心要訓練300或600個時期-如果您的模型開始過度擬合，Call Back將使您回到最佳時期。

checkpoint_name = 'Weights\Weights-{epoch:03d}--{val_loss:.5f}.hdf5'
checkpoint = ModelCheckpoint(checkpoint_name, monitor='val_loss', verbose = 1, save_best_only = True, mode ='auto')
callbacks_list = [checkpoint]

You need to define your callbacks: checkpoint_name specifying where and how you want to save weights for each epoch, checkpoint specifies how the CallBack should behave —I advise monitoring val_loss for improvement and saving only if the epoch made some progress on that.

您需要定義回調：checkpoint_name指定要為每個紀元保存權重的位置和方式，checkpoint指定CallBack的行為方式-我建議監視val_loss以進行改進并僅在紀元取得了一些進展時進行保存。

history = model.fit(X_train, y_train,
epochs=500, batch_size=1024,
validation_data=(X_test, y_test),
callbacks=callbacks_list,
verbose=1)

Then all you need to do is to add callbacks while fitting your model.

然后，您要做的就是在擬合模型的同時添加回調。

回調模型學習曲線 (Callbacks model learning curve)

Callbacks model learning curve (starting from epoch 10)回調模型學習曲線(從紀元10開始)

Using Callbacks allowed us to retrieve the optimal model trained at epoch 468 — the next 30 epochs did not improve as we started to overfit the train set.

使用回調使我們能夠檢索在468階段訓練的最優模型-由于我們開始過度擬合訓練集，因此接下來的30個時期并沒有改善。

9.模型演變總結 (9. Model evolution summary)

比較模型之間的驗證損失 (Comparing validation loss between models)

It took us 7 steps in order to get to the desired model output. We managed to improve at nearly every step, with a plateau between batch_norm and 1024_layer model, when our key goal was to reduce overfitting. To be honest refining these 7 steps, probably took me 70 steps so bear in mind that training DNNs is an interative process and don’t be put off if your improvement stagnates for a few hours.

為了獲得所需的模型輸出，我們花了7個步驟。當我們的主要目標是減少過度擬合時，我們設法在幾乎每個步驟上都進行了改進，在batch_norm和1024_layer模型之間保持穩定。老實說，細化這7個步驟可能要花我70個步驟，因此請記住，訓練DNN是一個交互過程，如果您的改進停滯了幾個小時，也不要拖延。

10. DNN與隨機森林 (10. DNN vs Random Forest)

Finally, how did our best DNN perform in comparison to a base Random Forest Regressor trained on the same data in the previous article?

最后，與上一篇文章中基于相同數據訓練的基本隨機森林回歸算法相比，我們最好的DNN表現如何？

In two key KPIs our Random Forest scored as follows:

在兩個關鍵的KPI中，我們的隨機森林得分如下：

Share of forecasts within 5% absolute error = 44.6%
占絕對誤差5％以內的預測份額= 44.6％
Mean percentage error = 8.8%
平均百分比誤差= 8.8％

Our best Deep Neural Network scored:

我們最好的深度神經網絡得分：

Share of forecasts within 5% absolute error = 43.3% (-1.3 p.p.)
占絕對誤差5％以內的預測份額= 43.3％(-1.3 pp)
Mean percentage error = 9.1% (+0.3 p.p.)
平均百分比誤差= 9.1％(+0.3 pp)

Can we cry now? How is it possible that after hours of meticulous training our advanced neural network did not beat a Random Forest? To be honest there are two key reasons:

我們現在可以哭嗎？經過數小時的精心訓練，我們先進的神經網絡怎么可能沒有擊敗隨機森林？坦白地說，有兩個主要原因：

A sample size of 25k records is still quite small in terms of training DNNs, I choose to give this architecture a try as I am gathering new data every month and I am confident that within a few months I will reach samples closer to 100k, which should give DNN the needed edge
就訓練DNN而言，25k記錄的樣本量仍然很小，我選擇嘗試一下這種架構，因為我每個月都在收集新數據，并且我相信在幾個月內我將達到接近100k的樣本。應該給DNN所需的優勢
The Random Forest model was quite overfitted and I am not confident that it would generalize well too other properties, despite high performance on validation set — at this point, I would probably still use the DNN model in production as more reliable.
盡管驗證集具有很高的性能，但是Random Forest模型非常適合，并且我不相信它會很好地推廣其他屬性，這時，我仍然會在生產中使用DNN模型，因為它更加可靠。

To summarize — I would advise against starting the solving of a regression problem with DNN. Unless you are working with hundreds of k samples on a really complex project, a Random Forest Regressor will usually be much faster to get initial results — if they prove to be promising you can proceed to DNN. Training efficient DNN takes more time and if your data sample is not large enough it might never reach Random Forest performance.

總結一下-我建議不要開始使用DNN解決回歸問題。除非您在一個非常復雜的項目中使用數百個樣本，否則，Random Forest Regressor通常會更快地獲得初始結果-如果它們被證明可以保證您可以繼續進行DNN。訓練有效的DNN需要花費更多時間，并且如果您的數據樣本不夠大，則可能永遠無法達到Random Forest性能。

[1]: Nitish Srivastava. (June 14 2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting

[1]：Nitish Srivastava。 (2014年6月14日)。輟學：防止神經網絡過度擬合的簡單方法

[2]: Sergey Ioffe. (Mar 2 2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

[2]：謝爾蓋·艾菲(Sergey Ioffe)。 (2015年3月2日)。批量歸一化：通過減少內部協變量偏移來加速深度網絡訓練

翻譯自: https://towardsdatascience.com/training-neural-networks-for-price-prediction-with-tensorflow-8aafe0c55198

總結

以上是生活随笔為你收集整理的使用TensorFlow训练神经网络进行价格预测的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 5.5亿元！英飞特拟收购全球照明巨头旗下
下一篇：您应该如何改变数据科学教育