交叉验证和超参数调整:如何优化您的机器学习模型
In the first two parts of this article I obtained and preprocessed Fitbit sleep data, split the data into training, validation and test set, trained three different Machine Learning models and compared their performance.
在本文的前兩部分中,我獲得并預(yù)處理了Fitbit睡眠數(shù)據(jù),將數(shù)據(jù)分為訓(xùn)練,驗(yàn)證和測試集,訓(xùn)練了三種不同的機(jī)器學(xué)習(xí)模型并比較了它們的性能。
In part 2, we saw that using the default hyperparameters for Random Forest and Extreme Gradient Boosting and evaluating model performance on the validation set led to Multiple Linear Regression performing best and Random Forest as well as Gradient Boosting Regressor performing slightly worse.
在第2部分中 ,我們看到將默認(rèn)超參數(shù)用于Random Forest和Extreme Gradient Boosting并在驗(yàn)證集上評(píng)估模型性能會(huì)導(dǎo)致多元線性回歸表現(xiàn)最佳,而Random Forest以及Gradient Boosting Regressor表現(xiàn)稍差。
In this part of the article I will discuss shortcomings of using only one validation set, how we address those shortcomings and how we can tune model hyperparameters to boost performance. Let’s dive in.
在本文的這一部分中,我將討論僅使用一個(gè)驗(yàn)證集的缺點(diǎn),我們?nèi)绾谓鉀Q這些缺點(diǎn)以及如何調(diào)整模型超參數(shù)以提高性能。 讓我們潛入。
交叉驗(yàn)證 (Cross-Validation)
簡單培訓(xùn),驗(yàn)證和測試拆分的缺點(diǎn) (Shortcomings of simple training, validation and test split)
In part 2 of this article we split the data into training, validation and test set, trained our models on the training set and evaluated them on the validation set. We have not touched the test set yet as it is intended as a hold-out set that represents never before seen data that will be used to evaluate how well the Machine Learning models generalise once we feel like they are ready for that final test.
在本文的第2部分中,我們將數(shù)據(jù)分為訓(xùn)練集,驗(yàn)證集和測試集,在訓(xùn)練集上訓(xùn)練我們的模型,并在驗(yàn)證集上對(duì)其進(jìn)行評(píng)估。 我們尚未觸及該測試集,因?yàn)樗荚谧鳛橐环N保留集,表示從未見過的數(shù)據(jù),一旦我們感覺它們已經(jīng)準(zhǔn)備好用于最終測試,它們將用于評(píng)估機(jī)器學(xué)習(xí)模型的概括程度。
Because we only split the data into one set of training data and one set of validation data, the performance metrics of our models are highly reliant on those two sets. They are only trained and evaluated once so the performance depends on that one evaluation and may perform very differently when trained and evaluated on different subsets of the same data, just because of the nature of how the subsets are picked.
因?yàn)槲覀儍H將數(shù)據(jù)分為一組訓(xùn)練數(shù)據(jù)和一組驗(yàn)證數(shù)據(jù),所以我們模型的性能指標(biāo)高度依賴于這兩套數(shù)據(jù)。 它們僅經(jīng)過訓(xùn)練和評(píng)估一次,因此性能取決于該評(píng)估,并且由于對(duì)同一數(shù)據(jù)的不同子集進(jìn)行訓(xùn)練和評(píng)估而導(dǎo)致的性能可能會(huì)大不相同。
What if we could do this split into training and validation test multiple times, each time on different subsets of the data, train and evaluate our models each time and look at the average performance of the models across multiple evaluations? Exactly that is the idea behind K-fold Cross-Validation.
如果我們可以多次對(duì)數(shù)據(jù)的不同子集進(jìn)行多次訓(xùn)練和驗(yàn)證測試,然后每次對(duì)模型進(jìn)行訓(xùn)練和評(píng)估,并查看多次評(píng)估中模型的平均性能,該怎么辦? 恰恰是K折交叉驗(yàn)證背后的想法。
K折交叉驗(yàn)證 (K-fold Cross-Validation)
In K-fold Cross-Validation (CV) we still start off by separating a test/hold-out set from the remaining data in the data set to use for the final evaluation of our models. The data that is remaining, i.e. everything apart from the test set, is split into K number of folds (subsets). The Cross-Validation then iterates through the folds and at each iteration uses one of the K folds as the validation set while using all remaining folds as the training set. This process is repeated until every fold has been used as a validation set. Here is what this process looks like for a 5-fold Cross-Validation:
在K折交叉驗(yàn)證(CV)中,我們?nèi)匀粡膶y試/保持集與數(shù)據(jù)集中的其余數(shù)據(jù)中分離出來以用于模型的最終評(píng)估開始。 剩余的數(shù)據(jù)(即除測試集以外的所有數(shù)據(jù))被分為K個(gè)折疊(子集)數(shù)。 然后,交叉驗(yàn)證會(huì)遍歷折痕,并且在每次迭代時(shí),將K折痕之一用作驗(yàn)證集,而將所有其余折痕用作訓(xùn)練集。 重復(fù)此過程,直到所有折痕都用作驗(yàn)證集為止。 這是5倍交叉驗(yàn)證的過程:
By training and testing the model K number of times on different subsets of the same training data we get a more accurate representation of how well our model might perform on data it has not seen before. In a K-fold CV we score the model after every iteration and compute the average of all scores to get a better representation of how the model performs compared to only using one training and validation set.
通過在相同訓(xùn)練數(shù)據(jù)的不同子集上對(duì)模型進(jìn)行K次訓(xùn)練和測試,我們可以更準(zhǔn)確地表示我們的模型在從未見過的數(shù)據(jù)上的表現(xiàn)如何。 與僅使用一個(gè)訓(xùn)練和驗(yàn)證集相比,在K折CV中,我們在每次迭代后對(duì)模型進(jìn)行評(píng)分,并計(jì)算所有評(píng)分的平均值,以更好地表示模型的性能。
Python中的K折交叉驗(yàn)證 (K-fold Cross-Validation in Python)
Because the Fitbit sleep data set is relatively small, I am going to use 4-fold Cross-Validation and compare the three models used so far: Multiple Linear Regression, Random Forest and Extreme Gradient Boosting Regressor.
由于Fitbit睡眠數(shù)據(jù)集相對(duì)較小,因此我將使用4倍交叉驗(yàn)證并比較到目前為止使用的三個(gè)模型:多重線性回歸,隨機(jī)森林和極端梯度增強(qiáng)回歸。
Note that a 4-fold CV also nicely compares to the training and validation split from part 2 because we split the data into 75% training and 25% validation data. A 4-fold CV essentially does the same, just four times, and using different subsets each time. I have created a function that takes as inputs a list of models that we would like to compare, the feature data, the target variable data and how many folds we would like to create. The function computes the performance measures we used previously and returns a table with the averages for all models as well as the scores for each type of measure per fold, in case we would like to investigate further. Here is the function:
請(qǐng)注意,由于我們將數(shù)據(jù)分為75%的訓(xùn)練數(shù)據(jù)和25%的驗(yàn)證數(shù)據(jù),因此4倍CV也可以很好地與第二部分的訓(xùn)練和驗(yàn)證結(jié)果進(jìn)行比較。 4倍CV本質(zhì)上是相同的,只有四次,并且每次使用不同的子集。 我創(chuàng)建了一個(gè)函數(shù),該函數(shù)將我們要比較的模型列表,特征數(shù)據(jù),目標(biāo)變量數(shù)據(jù)以及我們要?jiǎng)?chuàng)建的折疊數(shù)作為輸入。 該函數(shù)計(jì)算我們之前使用的性能指標(biāo),并返回一張表,其中包含所有模型的平均值以及每折類型每種指標(biāo)的得分,以防我們需要進(jìn)一步調(diào)查。 這是函數(shù):
Now we can create a list of models to be used and call the above function with a 4-fold Cross-Validation:
現(xiàn)在,我們可以創(chuàng)建要使用的模型的列表,并使用4倍交叉驗(yàn)證調(diào)用上述函數(shù):
The resulting comparison table looks like this:
結(jié)果比較表如下所示:
Using a 4-fold CV, the Random Forest Regressor outperforms the other two models on all performance measures. But in part 2 we saw that the Multiple Linear Regression had the best performance metrics, why has that changed?
使用4倍CV,隨機(jī)森林回歸指標(biāo)在所有績效指標(biāo)上均優(yōu)于其他兩個(gè)模型。 但是在第2部分中,我們看到多元線性回歸具有最佳的性能指標(biāo),為什么這有所改變?
In order to understand why the Cross-Validation results in different scores than the simple training and validation split from part 2 we need to have a closer look at how the models perform on each fold. The cv_comparison() function from above also returns a list of all the scores for each different model for every fold. Let’s have a look at how the R-squared of the three models compares for each fold. In order to have the results in table format, let’s quickly transform it into a DataFrame as well:
為了理解為什么交叉驗(yàn)證與第2部分中的簡單訓(xùn)練和驗(yàn)證結(jié)果不同的分?jǐn)?shù),我們需要仔細(xì)研究模型在每張紙上的表現(xiàn)。 上面的cv_comparison()函數(shù)還會(huì)返回每個(gè)折疊的每個(gè)不同模型的所有得分的列表。 讓我們看一下三個(gè)模型的R平方在每次折疊時(shí)的比較。 為了使結(jié)果具有表格格式,我們也將其快速轉(zhuǎn)換為DataFrame:
The above table makes it clear why the scores obtained from the 4-fold CV differ to that of the training and validation set. The R-squared varies a lot from fold to fold, especially for Extreme Gradient Boosting and Multiple Linear Regression. This also shows why it is so important to use Cross-Validation, especially for small data sets. If you only rely on one simple training and validation set your results may be vastly different depending on what split of the data you end up with.
上表清楚說明了為什么從4倍CV中獲得的分?jǐn)?shù)與訓(xùn)練和驗(yàn)證集的分?jǐn)?shù)不同。 R平方在折數(shù)之間變化很大,尤其是對(duì)于極端漸變增強(qiáng)和多重線性回歸。 這也說明了為什么使用交叉驗(yàn)證如此重要的原因,特別是對(duì)于小型數(shù)據(jù)集。 如果僅依靠一個(gè)簡單的訓(xùn)練和驗(yàn)證集,則結(jié)果可能會(huì)大不相同,具體取決于最終得到的數(shù)據(jù)拆分方式。
Now that we know what Cross-Validation is and why it is important let’s see if we can get more out of our models by tuning the hyperparameters.
現(xiàn)在我們知道了交叉驗(yàn)證是什么,為什么它很重要,讓我們看看是否可以通過調(diào)整超參數(shù)來從模型中獲得更多收益。
超參數(shù)調(diào)整 (Hyperparameter Tuning)
Unlike model parameters, which are learned during model training and can not be set arbitrarily, hyperparameters are parameters that can be set by the user before training a Machine Learning model. Examples of hyperparameters in a Random Forest are the number of decision trees to have in the forest, the maximum number of features to consider at each split or the maximum depth of the tree.
與模型參數(shù)是在模型訓(xùn)練期間學(xué)習(xí)的并且不能任意設(shè)置的模型參數(shù)不同,超參數(shù)是用戶在訓(xùn)練機(jī)器學(xué)習(xí)模型之前可以設(shè)置的參數(shù)。 隨機(jī)森林中超參數(shù)的示例包括森林中決策樹的數(shù)量,每次拆分時(shí)要考慮的要素的最大數(shù)量或樹的最大深度。
As I mentioned previously, there is no one-size-fits-all solution to finding optimum hyperparameters. A set of hyperparameters that performs well for one Machine Learning problem may perform poorly on another one. So how do we figure out what the optimal hyperparameters are?
正如我之前提到的,找到最佳超參數(shù)并沒有一種千篇一律的解決方案。 一組對(duì)一個(gè)機(jī)器學(xué)習(xí)問題表現(xiàn)良好的超參數(shù)可能在另一個(gè)問題上表現(xiàn)不佳。 那么,我們?nèi)绾握页鲎罴殉瑓?shù)呢?
One possible way is to manually tune the hyperparameters using educated guesses as starting points, changing some hyperparameters, training the model, evaluating its performance and repeating these steps until we are happy with the performance. That sounds like an unnecessarily tedious approach and it is.
一種可能的方法是使用有根據(jù)的猜測作為起點(diǎn)來手動(dòng)調(diào)整超參數(shù),更改一些超參數(shù),訓(xùn)練模型,評(píng)估其性能,然后重復(fù)這些步驟,直到我們對(duì)性能滿意為止。 這聽起來像是一種不必要的乏味方法。
Compare hyperparameter tuning to tuning a guitar. You could choose to tune a guitar by ear, which requires a lot of practice and patience and may never lead to an optimal result, especially if you are a beginner. Luckily, there are electric guitar tuners which help you find the correct tones by interpreting the sound waves of your guitar strings and displaying what it reads. You still have to tune the strings using the machine head but the process will be much quicker and the electric tuner ensures your tuning is close to optimal. So what’s the Machine Learning equivalent of an electric guitar tuner?
比較超參數(shù)調(diào)整與調(diào)整吉他。 您可以選擇用耳朵來調(diào)音吉他,這需要大量的練習(xí)和耐心,并且可能永遠(yuǎn)不會(huì)導(dǎo)致最佳效果,尤其是對(duì)于初學(xué)者而言。 幸運(yùn)的是,有一些電吉他調(diào)音器可以幫助您通過解釋吉他弦的聲波并顯示其讀數(shù)來找到正確的音調(diào)。 您仍然必須使用機(jī)頭調(diào)弦,但是過程會(huì)更快,并且電子調(diào)諧器可確保您的調(diào)音接近最佳。 那么什么是機(jī)器學(xué)習(xí)等同于電吉他調(diào)音器?
隨機(jī)網(wǎng)格搜索交叉驗(yàn)證 (Randomised Grid Search Cross-Validation)
One of the most popular approaches to tune Machine Learning hyperparameters is called RandomizedSearchCV() in scikit-learn. Let’s dissect what this means.
調(diào)整機(jī)器學(xué)習(xí)超參數(shù)的最流行方法之一是scikit-learn中的RandomizedSearchCV()。 讓我們剖析這意味著什么。
In Randomised Grid Search Cross-Validation we start by creating a grid of hyperparameters we want to optimise with values that we want to try out for those hyperparameters. Let’s look at an example of a hyperparameter grid for our Random Forest Regressor and how we can set it up:
在隨機(jī)網(wǎng)格搜索交叉驗(yàn)證中,我們首先創(chuàng)建一個(gè)我們要優(yōu)化的超參數(shù)網(wǎng)格,并使用這些超參數(shù)的值進(jìn)行嘗試。 讓我們看一下我們的Random Forest Regressor的超參數(shù)網(wǎng)格的示例,以及如何設(shè)置它:
First, we create a list of possible values for each hyperparameter we want to tune and then we set up the grid using a dictionary with the key-value pairs as shown above. In order to find and understand the hyperparameters of a Machine Learning model you can check out the model’s official documentation, see the one for Random Forest Regressor here.
首先,我們?yōu)橐{(diào)整的每個(gè)超參數(shù)創(chuàng)建一個(gè)可能值的列表,然后使用帶有鍵-值對(duì)的字典來設(shè)置網(wǎng)格,如上所示。 為了查找和理解機(jī)器學(xué)習(xí)模型的超參數(shù),您可以查看模型的官方文檔,請(qǐng)參閱此處的適用于Random Forest Regressor的文檔。
The resulting grid looks like this:
生成的網(wǎng)格如下所示:
As the name suggests, Randomised Grid Search Cross-Validation uses Cross-Validation to evaluate model performance. Random Search means that instead of trying out all possible combinations of hyperparameters (which would be 27,216 combinations in our example) the algorithm randomly chooses a value for each hyperparameter from the grid and evaluates the model using that random combination of hyperparameters.
顧名思義,隨機(jī)網(wǎng)格搜索交叉驗(yàn)證使用交叉驗(yàn)證來評(píng)估模型性能。 隨機(jī)搜索意味著,算法不會(huì)嘗試所有可能的超參數(shù)組合(在我們的示例中為27,216個(gè)組合),而是從網(wǎng)格中為每個(gè)超參數(shù)隨機(jī)選擇一個(gè)值,并使用該超參數(shù)的隨機(jī)組合評(píng)估模型。
Trying out all possible combinations would be really computationally expensive and take a long time. Choosing hyperparameters at random speeds up the process significantly and often provides a similarly good solution to trying out all possible combinations. Let’s see how the Randomised Grid Search Cross-Validation is used.
嘗試所有可能的組合在計(jì)算上確實(shí)是昂貴的,并且需要很長時(shí)間。 隨機(jī)選擇超參數(shù)可以大大加快該過程,并且通常為嘗試所有可能的組合提供類似的好解決方案。 讓我們看看如何使用隨機(jī)網(wǎng)格搜索交叉驗(yàn)證。
隨機(jī)森林的超參數(shù)調(diào)整 (Hyperparameter Tuning for Random Forest)
Using the previously created grid, we can find the best hyperparameters for our Random Forest Regressor. I will use a 3-fold CV because the data set is relatively small and run 200 random combinations. Therefore, in total, the Random Grid Search CV will train and evaluate 600 models (3 folds for 200 combinations). Because Random Forests tend to be slowly computed compared to other Machine Learning models such as Extreme Gradient Boosting, running this many models takes a few minutes. Once the process is completed we can obtain the best hyperparameters.
使用先前創(chuàng)建的網(wǎng)格,我們可以為我們的Random Forest Regressor找到最佳的超參數(shù)。 我將使用3倍CV,因?yàn)閿?shù)據(jù)集相對(duì)較小并且可以運(yùn)行200個(gè)隨機(jī)組合。 因此,總體而言,隨機(jī)網(wǎng)格搜索CV將訓(xùn)練和評(píng)估600個(gè)模型(200種組合的3倍)。 由于與其他機(jī)器學(xué)習(xí)模型(例如,極端梯度增強(qiáng))相比,隨機(jī)森林的計(jì)算速度往往較慢,因此運(yùn)行許多模型需要花費(fèi)幾分鐘。 一旦過程完成,我們可以獲得最佳的超參數(shù)。
Here is how to use RandomizedSearchCV():
這是使用RandomizedSearchCV()的方法:
We will use these hyperparameters in our final model, which we test on the test set.
我們將在最終模型中使用這些超參數(shù),并在測試集上進(jìn)行測試。
超參數(shù)調(diào)整可實(shí)現(xiàn)極端梯度提升 (Hyperparameter Tuning for Extreme Gradient Boosting)
For our Extreme Gradient Boosting Regressor the process is essentially the same as for the Random Forest. Some of the hyperparameters that we try to optimise are the same and some are different, due to the nature of the model. You can find the full list and explanations of the hyperparameters for XGBRegressor here. Once again, we create the grid:
對(duì)于我們的極端梯度增強(qiáng)回歸器,該過程與“隨機(jī)森林”基本相同。 由于模型的性質(zhì),我們嘗試優(yōu)化的一些超參數(shù)是相同的,而有些則是不同的。 您可以在此處找到XGBRegressor的超參數(shù)的完整列表和說明。 再一次,我們創(chuàng)建網(wǎng)格:
The resulting grid looks like this:
生成的網(wǎng)格如下所示:
In order to make the performance evaluations comparable I will use a 3-fold CV with 200 combinations for Extreme Gradient Boosting as well:
為了使性能評(píng)估具有可比性,我還將使用具有200種組合的3倍CV進(jìn)行極端梯度增強(qiáng):
The optimal hyperparameters are the following:
最佳超參數(shù)如下:
Again, these will be used in the final model.
同樣,這些將在最終模型中使用。
Although it might appear obvious to some people I just want to mention it here: the reason why we do not do hyperparameter optimisation for Multiple Linear Regression is because there are no hyperparameters to be tweaked in the model, it is simply a Multiple Linear Regression.
盡管對(duì)于某些人來說似乎很明顯,但我只想在此提及:我們不對(duì)多元線性回歸進(jìn)行超參數(shù)優(yōu)化的原因是,因?yàn)槟P椭袥]有要調(diào)整的超參數(shù),它只是多元線性回歸。
Now that we have obtained the optimal hyperparameters (at least in terms of our Cross-Validation) we can finally evaluate our models on the test data that we have been holding out since the very beginning of this analysis!
現(xiàn)在,我們已經(jīng)獲得了最佳超參數(shù)(至少在交叉驗(yàn)證方面),我們終于可以根據(jù)自分析開始就一直堅(jiān)持的測試數(shù)據(jù)評(píng)估模型!
最終模型評(píng)估 (Final model evaluation)
After evaluating the performance of our Machine Learning models and finding optimal hyperparameters it is time to put the models to their final test — the all-mighty hold-out set.
在評(píng)估了我們的機(jī)器學(xué)習(xí)模型的性能并找到了最佳超參數(shù)之后,是時(shí)候?qū)⑦@些模型進(jìn)行最終測試了-全能的支持集。
In order to do so, we train the models on the entire 80% of the data that we used for all of our evaluations so far, i.e. everything apart from the test set. We use the hyperparameters that we found in the previous part and then compare how our models perform on the test set.
為了做到這一點(diǎn),我們在到目前為止用于所有評(píng)估的全部80%數(shù)據(jù)(即除測試集之外的所有數(shù)據(jù))上訓(xùn)練模型。 我們使用在上一部分中找到的超參數(shù),然后比較我們的模型在測試集上的表現(xiàn)。
Let’s create and train our models:
讓我們創(chuàng)建和訓(xùn)練我們的模型:
I defined a function that scores all of the final models and creates a DataFrame that makes the comparison easy:
我定義了一個(gè)對(duì)所有最終模型進(jìn)行評(píng)分的函數(shù),并創(chuàng)建了使比較容易的DataFrame:
Calling that function with our three final models and adjusting the column headers results in the following final evaluation:
用我們的三個(gè)最終模型調(diào)用該函數(shù)并調(diào)整列標(biāo)題會(huì)導(dǎo)致以下最終評(píng)估:
And the winner is: Random Forest Regressor!
贏家是:隨機(jī)森林回歸者!
The Random Forest achieves an R-squared of 80% and an accuracy of 97.6% on the test set, meaning its predictions were only off by about 2.4% on average. Not bad!
隨機(jī)森林在測試集上的R平方達(dá)到80%,準(zhǔn)確度為97.6%,這意味著其預(yù)測平均僅降低了約2.4%。 不錯(cuò)!
The performance of the Multiple Linear Regression is not far behind but the Extreme Gradient Boosting failed to live up to its hype in this analysis.
多元線性回歸的性能并沒有落后,但在此分析中,極端梯度提升未能達(dá)到其炒作的目的。
結(jié)論性意見 (Concluding comments)
The process of coming up with this whole analysis and actually conducting it was a lot of fun. I have been trying to figure out how Fitbit computes Sleep Scores for a while now and am glad to understand it a bit better. On top of that, I managed to build a Machine Learning model that can predict Sleep Scores with great accuracy. That being said, there are a few things I want to highlight:
提出整個(gè)分析并進(jìn)行實(shí)際分析的過程非常有趣。 我一直在試圖弄清楚Fitbit如何計(jì)算睡眠分?jǐn)?shù),現(xiàn)在很高興能更好地理解它。 最重要的是,我設(shè)法建立了一個(gè)機(jī)器學(xué)習(xí)模型,可以非常準(zhǔn)確地預(yù)測睡眠分?jǐn)?shù)。 話雖如此,我想強(qiáng)調(diào)一些事情:
As I mentioned in part 2, the interpretation of the coefficients of the Multiple Linear Regression may not be accurate because there are high levels of multicollinearity between features.
正如我在第2部分中提到的,由于特征之間存在較高的多重共線性,因此多重線性回歸系數(shù)的解釋可能不準(zhǔn)確。
I hope you enjoyed this thorough analysis of how to use Machine Learning to predict Fitbit Sleep Scores and learned something about the importance of different sleep stages as well as the time spent asleep along the way.
我希望您喜歡如何使用機(jī)器學(xué)習(xí)來預(yù)測Fitbit睡眠得分的詳盡分析,并了解了不同睡眠階段的重要性以及整個(gè)過程中所花費(fèi)的時(shí)間。
I highly appreciate constructive feedback and you can reach out to me on LinkedIn any time.
非常感謝您提供建設(shè)性的反饋,您可以隨時(shí)在LinkedIn上與我聯(lián)系。
Thanks for reading!
謝謝閱讀!
翻譯自: https://towardsdatascience.com/cross-validation-and-hyperparameter-tuning-how-to-optimise-your-machine-learning-model-13f005af9d7d
總結(jié)
以上是生活随笔為你收集整理的交叉验证和超参数调整:如何优化您的机器学习模型的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 昆山希望家教,希望家教,昆山家教
- 下一篇: 安装好机器学习环境的虚拟机_虚拟环境之外