當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

神经网络优化器的选择_神经网络：优化器选择的重要性

發(fā)布時間：2023/12/15 编程问答 43 豆豆

生活随笔收集整理的這篇文章主要介紹了神经网络优化器的选择_神经网络：优化器选择的重要性小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

神經(jīng)網(wǎng)絡(luò)優(yōu)化器的選擇

When constructing a neural network, there are several optimizers available in the Keras API in order to do so.

在構(gòu)造神經(jīng)網(wǎng)絡(luò)時，Keras API中提供了多個優(yōu)化器來實現(xiàn)。

An optimizer is used to minimise the loss of a network by appropriately modifying the weights and learning rate.

優(yōu)化器用于通過適當(dāng)修改權(quán)重和學(xué)習(xí)率來最小化網(wǎng)絡(luò)的損失。

For regression-based problems (where the response variable is in numerical format), the most frequently encountered optimizer is the Adam optimizer, which uses a stochastic gradient descent method that estimates first-order and second-order moments.

對于基于回歸的問題(響應(yīng)變量為數(shù)字格式)，最常遇到的優(yōu)化器是Adam優(yōu)化器，它使用一種隨機梯度下降方法來估算一階和二階矩。

The available optimizers in the Keras API are as follows:

Keras API中可用的優(yōu)化器如下：

SGD
新元
RMSprop
RMSprop
Adam
亞當(dāng)
Adadelta
阿達達
Adagrad
阿達格勒
Adamax
阿達瑪克斯
Nadam
那達姆
Ftrl
Ftrl

The purpose of choosing the most suitable optimiser is not necessarily to achieve the highest accuracy per se — but rather to minimise the training required by the neural network to achieve a given level of accuracy. After all, it is much more efficient if a neural network can be trained to achieve a certain level of accuracy after 10 epochs than after 50, for instance.

選擇最合適的優(yōu)化器的目的不一定是要獲得最高的準確性，而是要使神經(jīng)網(wǎng)絡(luò)為達到給定的準確性而所需的訓(xùn)練降至最低。畢竟，如果可以訓(xùn)練一個神經(jīng)網(wǎng)絡(luò)在10個紀元之后達到一定水平的準確度，比50個紀要高得多。

預(yù)測酒店的平均每日房價 (Predicting Average Daily Rates for Hotels)

Let’s illustrate this using an example: predicting average daily rates (ADR) for hotels. This is the output variable.

讓我們用一個例子來說明這一點：預(yù)測酒店的平均每日房價(ADR) 。這是輸出變量。

The features used in the model are as follows:

模型中使用的功能如下：

1. Cancellations: Whether a customer cancels their booking

1.取消：客戶是否取消預(yù)訂

2. Country of Origin

2.原產(chǎn)國

3. Market Segment

3.市場細分

4. Deposit Paid

4.已付定金

5. Customer Type

5.客戶類型

6. Required Car Parking Spaces

6.所需的停車位

7. Arrival Date: Week Number

7.到達日期：周號

This analysis is based on the original study by Antonio, Almeida and Nunes (2016) as cited in the References section below.

該分析基于Antonio，Almeida和Nunes(2016)的原始研究，該研究在以下參考部分中引用。

A neural network model is defined with 8 input neurons, 1,669 hidden neurons, and 1 output neuron.

用8個輸入神經(jīng)元， 1,669個隱藏神經(jīng)元和1個輸出神經(jīng)元定義了一個神經(jīng)網(wǎng)絡(luò)模型。

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 8) 72
_________________________________________________________________
dense_1 (Dense) (None, 1669) 15021
_________________________________________________________________
dense_2 (Dense) (None, 1) 1670
=================================================================
Total params: 16,763
Trainable params: 16,763
Non-trainable params: 0

Using 30 epochs and a batch size of 150, the losses across the different optimizers were compared using the available optimizers in the Keras API, along with the mean absolute error and root mean squared error performance on a separate test set.

使用30個紀元和150個批處理量，使用Keras API中可用的優(yōu)化器比較了不同優(yōu)化器的損失，以及在單獨的測試集上的平均絕對誤差和均方根誤差性能。

For the purposes of this article, let’s compare the performance of the Adam and SGD optimizers.

出于本文的目的，讓我們比較Adam和SGD優(yōu)化器的性能。

亞當(dāng) (Adam)

Source: Jupyter Notebook Output資料來源：Jupyter Notebook輸出

Mean Absolute Error: 29.89
平均絕對誤差： 29.89
Root Mean Squared Error: 43.91
均方根誤差： 43.91

新元 (SGD)

Source: Jupyter Notebook Output資料來源：Jupyter Notebook輸出

Mean Absolute Error: 29.15
平均絕對錯誤： 29.15
Root Mean Squared Error: 44.24
均方根誤差： 44.24

When comparing these two optimizers, we see that the MAE and RMSE obtained on the test set is virtually identical. However, we see that the training and validation loss is significantly lower after 30 epochs when the Adam optimizer is used.

比較這兩個優(yōu)化器時，我們發(fā)現(xiàn)在測試集上獲得的MAE和RMSE實際上是相同的。但是，我們看到，使用Adam優(yōu)化器后，經(jīng)過30個紀元后，訓(xùn)練和驗證損失顯著降低。

Even with lower training and validation loss — test accuracy using SGD was just as good — at least in this instance. However, training time using the Adam optimizer was 13.48 seconds, while it came in at 9.05 seconds for SGD.

即使培訓(xùn)和驗證損失較低，至少在這種情況下，使用SGD進行測試的準確性也一樣。但是，使用Adam優(yōu)化器的訓(xùn)練時間為13.48秒，而SGD的訓(xùn)練時間為9.05秒。

From this standpoint, one could make the argument that while Adam has shown lower training and validation loss — this does not translate into accuracy gains on the test set versus SGD, while SGD accomplished the job using less training time. While this is a simplistic example and a few seconds difference is immaterial here — this will quickly add up when using larger datasets or training more complex models.

從這一觀點出發(fā)，可以得出這樣的論據(jù)：盡管Adam顯示出較低的訓(xùn)練和驗證損失-這并不意味著相對于SGD而言，測試集的準確性得到了提高，而SGD卻以更少的訓(xùn)練時間完成了這項工作。盡管這是一個簡單的示例，但此處的差別不大，因此在使用較大的數(shù)據(jù)集或訓(xùn)練更復(fù)雜的模型時，這會很快加在一起。

As a matter of fact, much debate continues as to whether Adam is in fact a suitable alternative to SGD — with many researchers still opting for the latter. For instance, a Cornell University study by Wilson et al (2017) found instances where adaptive methods like Adam generalise worse than SGD — even when training performance was found to be superior.

事實上，關(guān)于亞當(dāng)是否真的可以替代SGD的爭論仍在繼續(xù)，許多研究人員仍在選擇后者。例如，威爾遜(Wilson)等人(2017年)在康奈爾大學(xué)進行的一項研究發(fā)現(xiàn)，即使發(fā)現(xiàn)訓(xùn)練性能更好，像亞當(dāng)(Adam)這樣的自適應(yīng)方法也普遍比SGD糟糕。

While it is not possible to do justice to such a complex topic in this article — one important takeaway is that attention should be given to the choice of optimizer when training a neural network. It is not necessarily a given that Adam will always provide the best results, and other optimizers can achieve similar test set accuracy in less training time.

盡管在本文中不可能對這樣一個復(fù)雜的主題做出公正的評價，但重要的一點是，在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時應(yīng)注意優(yōu)化器的選擇。亞當(dāng)將始終提供最佳結(jié)果，而其他優(yōu)化器可以在更少的培訓(xùn)時間內(nèi)獲得相似的測試集精度，這并不一定是必然的。

In fact, it is good practice to test models across a range of optimizers, as each has their limitations.

實際上，最好在多個優(yōu)化器之間測試模型，因為每個模型都有其局限性。

結(jié)論 (Conclusion)

This article has discussed the importance of suitable optimizer selection when constructing a neural network, and the importance of taking both accuracy and training time into consideration when doing so.

本文討論了在構(gòu)建神經(jīng)網(wǎng)絡(luò)時選擇合適的優(yōu)化器的重要性，以及在考慮準確性和訓(xùn)練時間時的重要性。

Hope you found this article useful, and any questions or feedback are greatly appreciated. The relevant references, as well as the code and datasets for running the above example are available below at the MGCodesandStats GitHub repository.

希望本文對您有用，對您的任何問題或反饋都深表感謝。有關(guān)參考，以及運行上述示例的代碼和數(shù)據(jù)集，請參見下面的MGCodesandStats GitHub存儲庫。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免責(zé)聲明：本文按“原樣”撰寫，不作任何擔(dān)保。它旨在提供數(shù)據(jù)科學(xué)概念的概述，并且不應(yīng)以任何方式解釋為專業(yè)建議。

翻譯自: https://towardsdatascience.com/neural-networks-importance-of-optimizer-selection-16fdbbed3ff0

神經(jīng)網(wǎng)絡(luò)優(yōu)化器的選擇

總結(jié)

以上是生活随笔為你收集整理的神经网络优化器的选择_神经网络：优化器选择的重要性的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：战地1怎么下载
下一篇：客户细分_客户细分：K-Means聚类和