日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

交叉验证和超参数调整:如何优化您的机器学习模型

發布時間:2023/12/15 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 交叉验证和超参数调整:如何优化您的机器学习模型 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

In the first two parts of this article I obtained and preprocessed Fitbit sleep data, split the data into training, validation and test set, trained three different Machine Learning models and compared their performance.

在本文的前兩部分中,我獲得并預處理了Fitbit睡眠數據,將數據分為訓練,驗證和測試集,訓練了三種不同的機器學習模型并比較了它們的性能。

In part 2, we saw that using the default hyperparameters for Random Forest and Extreme Gradient Boosting and evaluating model performance on the validation set led to Multiple Linear Regression performing best and Random Forest as well as Gradient Boosting Regressor performing slightly worse.

在第2部分中 ,我們看到將默認超參數用于Random Forest和Extreme Gradient Boosting并在驗證集上評估模型性能會導致多元線性回歸表現最佳,而Random Forest以及Gradient Boosting Regressor表現稍差。

In this part of the article I will discuss shortcomings of using only one validation set, how we address those shortcomings and how we can tune model hyperparameters to boost performance. Let’s dive in.

在本文的這一部分中,我將討論僅使用一個驗證集的缺點,我們如何解決這些缺點以及如何調整模型超參數以提高性能。 讓我們潛入。

交叉驗證 (Cross-Validation)

簡單培訓,驗證和測試拆分的缺點 (Shortcomings of simple training, validation and test split)

In part 2 of this article we split the data into training, validation and test set, trained our models on the training set and evaluated them on the validation set. We have not touched the test set yet as it is intended as a hold-out set that represents never before seen data that will be used to evaluate how well the Machine Learning models generalise once we feel like they are ready for that final test.

在本文的第2部分中,我們將數據分為訓練集,驗證集和測試集,在訓練集上訓練我們的模型,并在驗證集上對其進行評估。 我們尚未觸及該測試集,因為它旨在作為一種保留集,表示從未見過的數據,一旦我們感覺它們已經準備好用于最終測試,它們將用于評估機器學習模型的概括程度。

Because we only split the data into one set of training data and one set of validation data, the performance metrics of our models are highly reliant on those two sets. They are only trained and evaluated once so the performance depends on that one evaluation and may perform very differently when trained and evaluated on different subsets of the same data, just because of the nature of how the subsets are picked.

因為我們僅將數據分為一組訓練數據和一組驗證數據,所以我們模型的性能指標高度依賴于這兩套數據。 它們僅經過訓練和評估一次,因此性能取決于該評估,并且由于對同一數據的不同子集進行訓練和評估而導致的性能可能會大不相同。

What if we could do this split into training and validation test multiple times, each time on different subsets of the data, train and evaluate our models each time and look at the average performance of the models across multiple evaluations? Exactly that is the idea behind K-fold Cross-Validation.

如果我們可以多次對數據的不同子集進行多次訓練和驗證測試,然后每次對模型進行訓練和評估,并查看多次評估中模型的平均性能,該怎么辦? 恰恰是K折交叉驗證背后的想法。

K折交叉驗證 (K-fold Cross-Validation)

In K-fold Cross-Validation (CV) we still start off by separating a test/hold-out set from the remaining data in the data set to use for the final evaluation of our models. The data that is remaining, i.e. everything apart from the test set, is split into K number of folds (subsets). The Cross-Validation then iterates through the folds and at each iteration uses one of the K folds as the validation set while using all remaining folds as the training set. This process is repeated until every fold has been used as a validation set. Here is what this process looks like for a 5-fold Cross-Validation:

在K折交叉驗證(CV)中,我們仍然從將測試/保持集與數據集中的其余數據中分離出來以用于模型的最終評估開始。 剩余的數據(即除測試集以外的所有數據)被分為K個折疊(子集)數。 然后,交叉驗證會遍歷折痕,并且在每次迭代時,將K折痕之一用作驗證集,而將所有其余折痕用作訓練集。 重復此過程,直到所有折痕都用作驗證集為止。 這是5倍交叉驗證的過程:

By training and testing the model K number of times on different subsets of the same training data we get a more accurate representation of how well our model might perform on data it has not seen before. In a K-fold CV we score the model after every iteration and compute the average of all scores to get a better representation of how the model performs compared to only using one training and validation set.

通過在相同訓練數據的不同子集上對模型進行K次訓練和測試,我們可以更準確地表示我們的模型在從未見過的數據上的表現如何。 與僅使用一個訓練和驗證集相比,在K折CV中,我們在每次迭代后對模型進行評分,并計算所有評分的平均值,以更好地表示模型的性能。

Python中的K折交叉驗證 (K-fold Cross-Validation in Python)

Because the Fitbit sleep data set is relatively small, I am going to use 4-fold Cross-Validation and compare the three models used so far: Multiple Linear Regression, Random Forest and Extreme Gradient Boosting Regressor.

由于Fitbit睡眠數據集相對較小,因此我將使用4倍交叉驗證并比較到目前為止使用的三個模型:多重線性回歸,隨機森林和極端梯度增強回歸。

Note that a 4-fold CV also nicely compares to the training and validation split from part 2 because we split the data into 75% training and 25% validation data. A 4-fold CV essentially does the same, just four times, and using different subsets each time. I have created a function that takes as inputs a list of models that we would like to compare, the feature data, the target variable data and how many folds we would like to create. The function computes the performance measures we used previously and returns a table with the averages for all models as well as the scores for each type of measure per fold, in case we would like to investigate further. Here is the function:

請注意,由于我們將數據分為75%的訓練數據和25%的驗證數據,因此4倍CV也可以很好地與第二部分的訓練和驗證結果進行比較。 4倍CV本質上是相同的,只有四次,并且每次使用不同的子集。 我創建了一個函數,該函數將我們要比較的模型列表,特征數據,目標變量數據以及我們要創建的折疊數作為輸入。 該函數計算我們之前使用的性能指標,并返回一張表,其中包含所有模型的平均值以及每折類型每種指標的得分,以防我們需要進一步調查。 這是函數:

Now we can create a list of models to be used and call the above function with a 4-fold Cross-Validation:

現在,我們可以創建要使用的模型的列表,并使用4倍交叉驗證調用上述函數:

The resulting comparison table looks like this:

結果比較表如下所示:

Using a 4-fold CV, the Random Forest Regressor outperforms the other two models on all performance measures. But in part 2 we saw that the Multiple Linear Regression had the best performance metrics, why has that changed?

使用4倍CV,隨機森林回歸指標在所有績效指標上均優于其他兩個模型。 但是在第2部分中,我們看到多元線性回歸具有最佳的性能指標,為什么這有所改變?

In order to understand why the Cross-Validation results in different scores than the simple training and validation split from part 2 we need to have a closer look at how the models perform on each fold. The cv_comparison() function from above also returns a list of all the scores for each different model for every fold. Let’s have a look at how the R-squared of the three models compares for each fold. In order to have the results in table format, let’s quickly transform it into a DataFrame as well:

為了理解為什么交叉驗證與第2部分中的簡單訓練和驗證結果不同的分數,我們需要仔細研究模型在每張紙上的表現。 上面的cv_comparison()函數還會返回每個折疊的每個不同模型的所有得分的列表。 讓我們看一下三個模型的R平方在每次折疊時的比較。 為了使結果具有表格格式,我們也將其快速轉換為DataFrame:

The above table makes it clear why the scores obtained from the 4-fold CV differ to that of the training and validation set. The R-squared varies a lot from fold to fold, especially for Extreme Gradient Boosting and Multiple Linear Regression. This also shows why it is so important to use Cross-Validation, especially for small data sets. If you only rely on one simple training and validation set your results may be vastly different depending on what split of the data you end up with.

上表清楚說明了為什么從4倍CV中獲得的分數與訓練和驗證集的分數不同。 R平方在折數之間變化很大,尤其是對于極端漸變增強和多重線性回歸。 這也說明了為什么使用交叉驗證如此重要的原因,特別是對于小型數據集。 如果僅依靠一個簡單的訓練和驗證集,則結果可能會大不相同,具體取決于最終得到的數據拆分方式。

Now that we know what Cross-Validation is and why it is important let’s see if we can get more out of our models by tuning the hyperparameters.

現在我們知道了交叉驗證是什么,為什么它很重要,讓我們看看是否可以通過調整超參數來從模型中獲得更多收益。

超參數調整 (Hyperparameter Tuning)

Unlike model parameters, which are learned during model training and can not be set arbitrarily, hyperparameters are parameters that can be set by the user before training a Machine Learning model. Examples of hyperparameters in a Random Forest are the number of decision trees to have in the forest, the maximum number of features to consider at each split or the maximum depth of the tree.

與模型參數是在模型訓練期間學習的并且不能任意設置的模型參數不同,超參數是用戶在訓練機器學習模型之前可以設置的參數。 隨機森林中超參數的示例包括森林中決策樹的數量,每次拆分時要考慮的要素的最大數量或樹的最大深度。

As I mentioned previously, there is no one-size-fits-all solution to finding optimum hyperparameters. A set of hyperparameters that performs well for one Machine Learning problem may perform poorly on another one. So how do we figure out what the optimal hyperparameters are?

正如我之前提到的,找到最佳超參數并沒有一種千篇一律的解決方案。 一組對一個機器學習問題表現良好的超參數可能在另一個問題上表現不佳。 那么,我們如何找出最佳超參數呢?

One possible way is to manually tune the hyperparameters using educated guesses as starting points, changing some hyperparameters, training the model, evaluating its performance and repeating these steps until we are happy with the performance. That sounds like an unnecessarily tedious approach and it is.

一種可能的方法是使用有根據的猜測作為起點來手動調整超參數,更改一些超參數,訓練模型,評估其性能,然后重復這些步驟,直到我們對性能滿意為止。 這聽起來像是一種不必要的乏味方法。

Compare hyperparameter tuning to tuning a guitar. You could choose to tune a guitar by ear, which requires a lot of practice and patience and may never lead to an optimal result, especially if you are a beginner. Luckily, there are electric guitar tuners which help you find the correct tones by interpreting the sound waves of your guitar strings and displaying what it reads. You still have to tune the strings using the machine head but the process will be much quicker and the electric tuner ensures your tuning is close to optimal. So what’s the Machine Learning equivalent of an electric guitar tuner?

比較超參數調整與調整吉他。 您可以選擇用耳朵來調音吉他,這需要大量的練習和耐心,并且可能永遠不會導致最佳效果,尤其是對于初學者而言。 幸運的是,有一些電吉他調音器可以幫助您通過解釋吉他弦的聲波并顯示其讀數來找到正確的音調。 您仍然必須使用機頭調弦,但是過程會更快,并且電子調諧器可確保您的調音接近最佳。 那么什么是機器學習等同于電吉他調音器?

隨機網格搜索交叉驗證 (Randomised Grid Search Cross-Validation)

One of the most popular approaches to tune Machine Learning hyperparameters is called RandomizedSearchCV() in scikit-learn. Let’s dissect what this means.

調整機器學習超參數的最流行方法之一是scikit-learn中的RandomizedSearchCV()。 讓我們剖析這意味著什么。

In Randomised Grid Search Cross-Validation we start by creating a grid of hyperparameters we want to optimise with values that we want to try out for those hyperparameters. Let’s look at an example of a hyperparameter grid for our Random Forest Regressor and how we can set it up:

在隨機網格搜索交叉驗證中,我們首先創建一個我們要優化的超參數網格,并使用這些超參數的值進行嘗試。 讓我們看一下我們的Random Forest Regressor的超參數網格的示例,以及如何設置它:

First, we create a list of possible values for each hyperparameter we want to tune and then we set up the grid using a dictionary with the key-value pairs as shown above. In order to find and understand the hyperparameters of a Machine Learning model you can check out the model’s official documentation, see the one for Random Forest Regressor here.

首先,我們為要調整的每個超參數創建一個可能值的列表,然后使用帶有鍵-值對的字典來設置網格,如上所示。 為了查找和理解機器學習模型的超參數,您可以查看模型的官方文檔,請參閱此處的適用于Random Forest Regressor的文檔。

The resulting grid looks like this:

生成的網格如下所示:

As the name suggests, Randomised Grid Search Cross-Validation uses Cross-Validation to evaluate model performance. Random Search means that instead of trying out all possible combinations of hyperparameters (which would be 27,216 combinations in our example) the algorithm randomly chooses a value for each hyperparameter from the grid and evaluates the model using that random combination of hyperparameters.

顧名思義,隨機網格搜索交叉驗證使用交叉驗證來評估模型性能。 隨機搜索意味著,算法不會嘗試所有可能的超參數組合(在我們的示例中為27,216個組合),而是從網格中為每個超參數隨機選擇一個值,并使用該超參數的隨機組合評估模型。

Trying out all possible combinations would be really computationally expensive and take a long time. Choosing hyperparameters at random speeds up the process significantly and often provides a similarly good solution to trying out all possible combinations. Let’s see how the Randomised Grid Search Cross-Validation is used.

嘗試所有可能的組合在計算上確實是昂貴的,并且需要很長時間。 隨機選擇超參數可以大大加快該過程,并且通常為嘗試所有可能的組合提供類似的好解決方案。 讓我們看看如何使用隨機網格搜索交叉驗證。

隨機森林的超參數調整 (Hyperparameter Tuning for Random Forest)

Using the previously created grid, we can find the best hyperparameters for our Random Forest Regressor. I will use a 3-fold CV because the data set is relatively small and run 200 random combinations. Therefore, in total, the Random Grid Search CV will train and evaluate 600 models (3 folds for 200 combinations). Because Random Forests tend to be slowly computed compared to other Machine Learning models such as Extreme Gradient Boosting, running this many models takes a few minutes. Once the process is completed we can obtain the best hyperparameters.

使用先前創建的網格,我們可以為我們的Random Forest Regressor找到最佳的超參數。 我將使用3倍CV,因為數據集相對較小并且可以運行200個隨機組合。 因此,總體而言,隨機網格搜索CV將訓練和評估600個模型(200種組合的3倍)。 由于與其他機器學習模型(例如,極端梯度增強)相比,隨機森林的計算速度往往較慢,因此運行許多模型需要花費幾分鐘。 一旦過程完成,我們可以獲得最佳的超參數。

Here is how to use RandomizedSearchCV():

這是使用RandomizedSearchCV()的方法:

We will use these hyperparameters in our final model, which we test on the test set.

我們將在最終模型中使用這些超參數,并在測試集上進行測試。

超參數調整可實現極端梯度提升 (Hyperparameter Tuning for Extreme Gradient Boosting)

For our Extreme Gradient Boosting Regressor the process is essentially the same as for the Random Forest. Some of the hyperparameters that we try to optimise are the same and some are different, due to the nature of the model. You can find the full list and explanations of the hyperparameters for XGBRegressor here. Once again, we create the grid:

對于我們的極端梯度增強回歸器,該過程與“隨機森林”基本相同。 由于模型的性質,我們嘗試優化的一些超參數是相同的,而有些則是不同的。 您可以在此處找到XGBRegressor的超參數的完整列表和說明。 再一次,我們創建網格:

The resulting grid looks like this:

生成的網格如下所示:

In order to make the performance evaluations comparable I will use a 3-fold CV with 200 combinations for Extreme Gradient Boosting as well:

為了使性能評估具有可比性,我還將使用具有200種組合的3倍CV進行極端梯度增強:

The optimal hyperparameters are the following:

最佳超參數如下:

Again, these will be used in the final model.

同樣,這些將在最終模型中使用。

Although it might appear obvious to some people I just want to mention it here: the reason why we do not do hyperparameter optimisation for Multiple Linear Regression is because there are no hyperparameters to be tweaked in the model, it is simply a Multiple Linear Regression.

盡管對于某些人來說似乎很明顯,但我只想在此提及:我們不對多元線性回歸進行超參數優化的原因是,因為模型中沒有要調整的超參數,它只是多元線性回歸。

Now that we have obtained the optimal hyperparameters (at least in terms of our Cross-Validation) we can finally evaluate our models on the test data that we have been holding out since the very beginning of this analysis!

現在,我們已經獲得了最佳超參數(至少在交叉驗證方面),我們終于可以根據自分析開始就一直堅持的測試數據評估模型!

最終模型評估 (Final model evaluation)

After evaluating the performance of our Machine Learning models and finding optimal hyperparameters it is time to put the models to their final test — the all-mighty hold-out set.

在評估了我們的機器學習模型的性能并找到了最佳超參數之后,是時候將這些模型進行最終測試了-全能的支持集。

In order to do so, we train the models on the entire 80% of the data that we used for all of our evaluations so far, i.e. everything apart from the test set. We use the hyperparameters that we found in the previous part and then compare how our models perform on the test set.

為了做到這一點,我們在到目前為止用于所有評估的全部80%數據(即除測試集之外的所有數據)上訓練模型。 我們使用在上一部分中找到的超參數,然后比較我們的模型在測試集上的表現。

Let’s create and train our models:

讓我們創建和訓練我們的模型:

I defined a function that scores all of the final models and creates a DataFrame that makes the comparison easy:

我定義了一個對所有最終模型進行評分的函數,并創建了使比較容易的DataFrame:

Calling that function with our three final models and adjusting the column headers results in the following final evaluation:

用我們的三個最終模型調用該函數并調整列標題會導致以下最終評估:

And the winner is: Random Forest Regressor!

贏家是:隨機森林回歸者!

The Random Forest achieves an R-squared of 80% and an accuracy of 97.6% on the test set, meaning its predictions were only off by about 2.4% on average. Not bad!

隨機森林在測試集上的R平方達到80%,準確度為97.6%,這意味著其預測平均僅降低了約2.4%。 不錯!

The performance of the Multiple Linear Regression is not far behind but the Extreme Gradient Boosting failed to live up to its hype in this analysis.

多元線性回歸的性能并沒有落后,但在此分析中,極端梯度提升未能達到其炒作的目的。

結論性意見 (Concluding comments)

The process of coming up with this whole analysis and actually conducting it was a lot of fun. I have been trying to figure out how Fitbit computes Sleep Scores for a while now and am glad to understand it a bit better. On top of that, I managed to build a Machine Learning model that can predict Sleep Scores with great accuracy. That being said, there are a few things I want to highlight:

提出整個分析并進行實際分析的過程非常有趣。 我一直在試圖弄清楚Fitbit如何計算睡眠分數,現在很高興能更好地理解它。 最重要的是,我設法建立了一個機器學習模型,可以非常準確地預測睡眠分數。 話雖如此,我想強調一些事情:

  • As I mentioned in part 2, the interpretation of the coefficients of the Multiple Linear Regression may not be accurate because there are high levels of multicollinearity between features.

    正如我在第2部分中提到的,由于特征之間存在較高的多重共線性,因此多重線性回歸系數的解釋可能不準確。

  • The data set that I used for this analysis is rather small as it relies on 286 data points obtained from Fitbit. This limits the generalisability of the results and a much bigger data set would be needed to be able to train more robust models.

    我用于此分析的數據集很小,因為它依賴于從Fitbit獲得的286個數據點。 這限制了結果的通用性,并且需要更大的數據集來訓練更強大的模型。
  • This analysis uses Fitbit sleep data of only one person and therefore may not generalise well to other people with different sleep patterns, heart rates, etc.

    該分析僅使用一個人的Fitbit睡眠數據,因此可能無法很好地推廣到具有不同睡眠模式,心率等的其他人。
  • I hope you enjoyed this thorough analysis of how to use Machine Learning to predict Fitbit Sleep Scores and learned something about the importance of different sleep stages as well as the time spent asleep along the way.

    我希望您喜歡如何使用機器學習來預測Fitbit睡眠得分的詳盡分析,并了解了不同睡眠階段的重要性以及整個過程中所花費的時間。

    I highly appreciate constructive feedback and you can reach out to me on LinkedIn any time.

    非常感謝您提供建設性的反饋,您可以隨時在LinkedIn上與我聯系。

    Thanks for reading!

    謝謝閱讀!

    翻譯自: https://towardsdatascience.com/cross-validation-and-hyperparameter-tuning-how-to-optimise-your-machine-learning-model-13f005af9d7d

    總結

    以上是生活随笔為你收集整理的交叉验证和超参数调整:如何优化您的机器学习模型的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。