超参数优化 贝叶斯优化框架_mlmachine-使用贝叶斯优化进行超参数调整
超參數(shù)優(yōu)化 貝葉斯優(yōu)化框架
機(jī)器 (mlmachine)
TL; DR (TL;DR)
mlmachine is a Python library that organizes and accelerates notebook-based machine learning experiments.
mlmachine是一個(gè)Python庫,可組織和加速基于筆記本的機(jī)器學(xué)習(xí)實(shí)驗(yàn)。
In this article, we use mlmachine to accomplish actions that would otherwise take considerable coding and effort, including:
在本文中,我們使用mlmachine來完成原本需要大量編碼和精力的操作,包括:
- Bayesian Optimization for Multiple Estimators in One Shot 一擊中多個(gè)估計(jì)的貝葉斯優(yōu)化
- Results Analysis 結(jié)果分析
- Model Reinstantiation 模型實(shí)例化
Check out the Jupyter Notebook for this article.
查看本文的Jupyter Notebook 。
Check out the project on GitHub.
在GitHub上檢查項(xiàng)目 。
And check out past mlmachine articles:
并查看過去的mlmachine文章:
一擊中多個(gè)估計(jì)的貝葉斯優(yōu)化 (Bayesian Optimization for Multiple Estimators in One Shot)
Bayesian optimization is typically described as an advancement beyond exhaustive grid searches, and rightfully so. This hyperparameter tuning strategy succeeds by using prior information to inform future parameter selection for a given estimator. Check out Will Koehrsen’s article on Medium for an excellent overview of the package.
貝葉斯優(yōu)化通常被描述為超越窮舉網(wǎng)格搜索的一種進(jìn)步,理所當(dāng)然的。 通過使用先驗(yàn)信息來通知給定估計(jì)量的將來參數(shù)選擇,此超參數(shù)調(diào)整策略成功完成。 請(qǐng)查看Will Koehrsen在Medium上的文章,以獲得該軟件包的出色概述。
mlmachine uses hyperopt as a foundation for performing Bayesian optimization, and takes the functionality of hyperopt a step further through a simplified workflow that allows for optimization of multiple models in single process execution. In this article, we are going to optimize four classifiers:
mlmachine使用hyperopt作為執(zhí)行貝葉斯優(yōu)化的基礎(chǔ),并通過簡化的工作流程使hyperopt的功能更進(jìn)一步,該工作流程允許在單個(gè)流程執(zhí)行中優(yōu)化多個(gè)模型。 在本文中,我們將優(yōu)化四個(gè)分類器:
LogisticRegression()
LogisticRegression()
XGBClassifier()
XGBClassifier()
RandomForestClassifier()
RandomForestClassifier()
KNeighborsClassifier()
KNeighborsClassifier()
準(zhǔn)備數(shù)據(jù) (Prepare Data)
First, we apply data preprocessing techniques to clean up our data. We’ll start by creating two Machine() objects — one for the training data and a second for the validation data:
首先,我們應(yīng)用數(shù)據(jù)預(yù)處理技術(shù)來清理數(shù)據(jù)。 我們將首先創(chuàng)建兩個(gè)Machine()對(duì)象-一個(gè)用于訓(xùn)練數(shù)據(jù),另一個(gè)用于驗(yàn)證數(shù)據(jù):
Now we process the data by imputing nulls and applying various binning, feature engineering and encoding techniques:
現(xiàn)在,我們通過估算空值并應(yīng)用各種裝倉,特征工程和編碼技術(shù)來處理數(shù)據(jù):
Here is the output, still in a DataFrame:
這是輸出,仍然在DataFrame :
功能重要性摘要 (Feature Importance Summary)
As a second preparatory step, we want to perform feature selection for each of our classifiers:
作為第二準(zhǔn)備步驟,我們要為每個(gè)分類器執(zhí)行特征選擇:
詳盡的迭代特征選擇 (Exhaustively Iterative Feature Selection)
For our final preparatory step, we use this feature selection summary to perform iterative cross-validation on smaller and smaller subsets of features for each of our estimators:
對(duì)于最后的準(zhǔn)備步驟,我們使用此特征選擇摘要為每個(gè)估計(jì)量對(duì)越來越小的特征子集執(zhí)行迭代交叉驗(yàn)證:
From this result, we extract our dictionary of optimum feature sets for each estimator:
從這個(gè)結(jié)果中,我們提取出每個(gè)估計(jì)量的最佳特征集字典:
The keys are estimator names, and the associated values are lists containing the column names of the best performing feature subset for each estimator. Here are the key/value pairs for XGBClassifier(), which used only 10 of the available 43 features to achieve the best average cross-validation accuracy on the validation dataset:
鍵是估計(jì)器名稱,而關(guān)聯(lián)的值是包含每個(gè)估計(jì)器性能最佳的特征子集的列名稱的列表。 以下是XGBClassifier()的鍵/值對(duì),該鍵/值對(duì)僅使用了43個(gè)可用功能中的10個(gè),以在驗(yàn)證數(shù)據(jù)集上實(shí)現(xiàn)最佳的平均交叉驗(yàn)證準(zhǔn)確性:
With our processed dataset and optimum feature subsets in hand, it’s time to use Bayesian optimization to tune the hyperparameters of our 4 estimators.
有了我們經(jīng)過處理的數(shù)據(jù)集和最佳特征子集,是時(shí)候使用貝葉斯優(yōu)化來調(diào)整4個(gè)估計(jì)量的超參數(shù)了。
概述我們的特征空間 (Outline Our Feature Space)
First, we need to establish our feature space for each parameter for each estimator:
首先,我們需要為每個(gè)估計(jì)量的每個(gè)參數(shù)建立特征空間:
The outermost keys of the dictionary are names of classifiers, represented by strings. The associated values are also dictionaries, where the keys are parameter names, represented as strings, and the values are hyperopt sampling distributions from which parameter values will be chosen.
字典的最外鍵是分類器的名稱,由字符串表示。 關(guān)聯(lián)的值也是字典,其中鍵是參數(shù)名稱(表示為字符串),值是hyperopt采樣分布,將從中選擇參數(shù)值。
運(yùn)行貝葉斯優(yōu)化作業(yè) (Run the Bayesian Optimization Job)
Now we’re ready to run our Bayesian optimization hyperparameter tuning job. We will use a built-in method belonging to our Machine() object called exec_bayes_optim_search(). Let’s see mlmachine in action:
現(xiàn)在,我們可以運(yùn)行貝葉斯優(yōu)化超參數(shù)調(diào)整工作了。 我們將使用屬于Machine()對(duì)象的內(nèi)置方法,稱為exec_bayes_optim_search() 。 讓我們看看mlmachine的作用:
Let’s review the parameters:
讓我們回顧一下參數(shù):
estimator_parameter_space: The dictionary-based feature space we setup above.
estimator_parameter_space :我們?cè)谏厦嬖O(shè)置的基于字典的特征空間。
data: Our observations.
data :我們的觀察。
target: Our target data.
target :我們的目標(biāo)數(shù)據(jù)。
columns: An optional parameter that allows us to subset the input dataset features. Accepts a list of feature names, which will apply equally to all estimators. Also accepts a dictionary, where the keys represent estimator class names and values are lists of feature names to be used with the associated estimator. In this example, we use the latter by passing in the dictionary returned by cross_val_feature_dict() in the FeatureSelector() workflow above.
columns :一個(gè)可選參數(shù),允許我們對(duì)輸入數(shù)據(jù)集要素進(jìn)行子集化。 接受要素名稱列表,這將同樣適用于所有估計(jì)量。 還接受字典,其中的鍵代表估計(jì)器類名稱,值是要與關(guān)聯(lián)的估計(jì)器一起使用的功能名稱的列表。 在此示例中,我們通過在上面的FeatureSelector()工作流程中cross_val_feature_dict()返回的字典來使用后者。
scoring: The scoring metric to be evaluated.
scoring :要評(píng)估的得分指標(biāo)。
n_folds: Number of folds to use in cross-validation procedure.
n_folds :在交叉驗(yàn)證過程中使用的折疊數(shù)。
iters: Total number of iterations to run the hyperparameter tuning process. In this example, we run the experiment for 200 iterations.
iters :運(yùn)行超參數(shù)調(diào)整過程的迭代總數(shù)。 在此示例中,我們對(duì)實(shí)驗(yàn)進(jìn)行了200次迭代。
show_progressbar: Controls whether progress bar displays and actively updates during the course of the process.
show_progressbar :控制進(jìn)度條是否在過程中顯示和主動(dòng)更新。
Anyone familiar with hyperopt will be wondering where the objective function is. mlmachine abstracts away this complexity.
任何熟悉hyperopt的人都會(huì)想知道目標(biāo)函數(shù)在哪里。 mlmachine消除了這種復(fù)雜性。
The process runtime depends on several attributes, including hardware, the number and type of estimators used, the number of folds, feature selection, and the number of sampling iterations. Runtimes can be quite lengthy. For this reason, exec_bayes_optim_search() automatically saves the result of each iteration to a CSV.
流程運(yùn)行時(shí)取決于幾個(gè)屬性,包括硬件,使用的估計(jì)量的數(shù)量和類型,折疊的數(shù)量,特征選擇以及采樣迭代的數(shù)量。 運(yùn)行時(shí)間可能很長。 因此, exec_bayes_optim_search()自動(dòng)將每次迭代的結(jié)果保存到CSV中。
結(jié)果分析 (Results Analysis)
結(jié)果匯總 (Results Summary)
Let’s start by loading and reviewing the results:
讓我們從加載和查看結(jié)果開始:
Our Bayesian optimization log maintains key information about each iteration:
我們的貝葉斯優(yōu)化日志維護(hù)有關(guān)每次迭代的關(guān)鍵信息:
- Iteration number, estimator and scoring metric 迭代數(shù),估計(jì)量和評(píng)分指標(biāo)
- Cross-validation summary statistics 交叉驗(yàn)證摘要統(tǒng)計(jì)
- Iteration training time 迭代訓(xùn)練時(shí)間
- Dictionary of parameters used 使用的參數(shù)字典
This log provides an immense amount of data for us to analyze and evaluate the effectiveness of the Bayesian optimization process.
該日志為我們提供了大量數(shù)據(jù),以分析和評(píng)估貝葉斯優(yōu)化過程的有效性。
模型優(yōu)化評(píng)估 (Model Optimization Assessment)
First and foremost, we want to see how if performance improved over the iterations.
首先,我們想看看在迭代過程中性能如何提高。
Let’s visualize the XGBClassifier() loss by iteration:
讓我們通過迭代可視化XGBClassifier()損失:
Each dot represents the performance of one of our 200 experiments. The key detail to notice is that the line of best fit has a clear downward slope - exactly what we want. This means that with each iteration, model performance tends to improve compared to the previous iterations.
每個(gè)點(diǎn)代表我們200個(gè)實(shí)驗(yàn)之一的性能。 需要注意的關(guān)鍵細(xì)節(jié)是,最合適的線具有明顯的向下傾斜-正是我們想要的。 這意味著與以前的迭代相比,每次迭代時(shí)模型性能都有提高的趨勢。
參數(shù)選擇評(píng)估 (Parameter Selection Assessment)
One of the coolest parts of Bayesian optimization is seeing how parameter selection is optimized.
貝葉斯優(yōu)化的最酷部分之一就是了解如何優(yōu)化參數(shù)選擇。
For each model and for each model’s parameters, we can generate a two-panel visual.
對(duì)于每個(gè)模型和每個(gè)模型的參數(shù),我們可以生成一個(gè)兩面板的視覺效果。
For numeric parameters, such as n_estimators or learning_rate, the two-visual panel includes:
對(duì)于數(shù)字參數(shù),例如n_estimators或learning_rate ,兩個(gè)可視面板包括:
- Parameter selection KDE, overplayed on a theoretical distribution KDE 參數(shù)選擇KDE,超過了理論分布KDE
- Parameter selection by iteration scatter plot, with line of best fit 通過迭代散點(diǎn)圖選擇參數(shù),并選擇最佳擬合線
For categorical parameters, such as loss function, the two-visual panel includes:
對(duì)于分類參數(shù)(例如損失函數(shù)),兩個(gè)可視面板包括:
- Parameter selection and theoretical distribution bar chart 參數(shù)選擇和理論分布條形圖
- Parameter selection by iteration scatter plot, faceted by parameter category 通過迭代散點(diǎn)圖選擇參數(shù),按參數(shù)類別進(jìn)行分面
Let’s review the parameter selection panels for KNeighborsClassifier():
讓我們回顧一下KNeighborsClassifier()的參數(shù)選擇面板:
The built-in method model_param_plot() cycles through of the estimator’s parameters and presents the appropriate panel given each parameter’s type. Let’s look at a numeric parameter and categorical parameter separately.
內(nèi)置方法model_param_plot()循環(huán)遍歷估計(jì)器的參數(shù),并根據(jù)每個(gè)參數(shù)的類型顯示適當(dāng)?shù)拿姘濉?讓我們分別看一下數(shù)字參數(shù)和分類參數(shù)。
First, we’ll review the panel for the numeric parameter n_neighbors:
首先,我們將在面板上查看數(shù)字參數(shù)n_neighbors :
On the left, we can see two overlapping kernel density plots summarizing the actual parameter selections and the theoretical parameter distribution. The purple line corresponds to the theoretical distribution, and, as expected, this curve is smooth and evenly distributed. The teal line corresponds to the actual parameter selections, and it’s clearly evident that hyperopt prefers values between 5 and 10.
在左側(cè),我們可以看到兩個(gè)重疊的核密度圖,總結(jié)了實(shí)際參數(shù)選擇和理論參數(shù)分布。 紫色線對(duì)應(yīng)于理論分布,并且正如預(yù)期的那樣,該曲線是平滑且均勻分布的。 藍(lán)綠色線對(duì)應(yīng)于實(shí)際的參數(shù)選擇,很明顯,hyperopt更喜歡5到10之間的值。
On the right, the scatter plot visualizes the n_neighbors value selections over the iterations. There is a slight downward slope to the line of best fit, as the Bayesian optimization process hones in on values around 7.
在右側(cè),散點(diǎn)圖將迭代中的n_neighbors值選擇可視化。 最佳擬合線略有向下傾斜,因?yàn)樨惾~斯優(yōu)化過程的值大約為7。
Next, we’ll review the panel for the categorical parameter algorithm:
接下來,我們將回顧分類參數(shù)algorithm面板:
On the left, we see a bar chart displaying the counts of parameter selections, faceted by actual parameter selections and selections from the theoretical distribution . The purple bars, representing selections from the theoretical distribution, are more even than the teal bars, representing the actual selection.
在左側(cè),我們看到一個(gè)條形圖,其中顯示了參數(shù)選擇的計(jì)數(shù),其中包括實(shí)際參數(shù)選擇和理論分布中的選擇。 代表理論分布的選擇的紫色條比代表實(shí)際選擇的藍(lán)綠色條還要均勻。
On the right, the scatter plot again visualizes the algorithm value selection over the iterations. There is a clear decrease in selection of “ball_tree” and “auto” in favor of “kd_tree” and “brute” over the the iterations.
在右側(cè),散點(diǎn)圖再次可視化了迭代中的algorithm值選擇。 在迭代過程中,對(duì)“ ball_tree”和“ auto”的選擇明顯減少,而對(duì)“ kd_tree”和“ brute”有利。
模型實(shí)例化 (Models Reinstantiation)
頂級(jí)模特鑒定 (Top Model Identification)
Our Machine() object has a built-in method called top_bayes_optim_models(), which identifies the best model for each estimator type based on the results in our Bayesian optimization log.
我們的Machine()對(duì)象具有一個(gè)稱為top_bayes_optim_models()的內(nèi)置方法,該方法根據(jù)貝葉斯優(yōu)化日志中的結(jié)果為每種估計(jì)器類型標(biāo)識(shí)最佳模型。
With this method, we can identify the top N models for each estimator based on mean cross-validation score. In this experiment, top_bayes_optim_models() returns the dictionary below, which tells us that LogisticRegression() identified its top model on iteration 30, XGBClassifier() on iteration 61, RandomForestClassifier() on iteration 46, and KNeighborsClassifier() on iteration 109.
使用這種方法,我們可以基于平均交叉驗(yàn)證得分為每個(gè)估計(jì)量確定前N個(gè)模型。 在該實(shí)驗(yàn)中, top_bayes_optim_models()返回下面的字典,它告訴我們, LogisticRegression()識(shí)別其頂部模型上迭代30, XGBClassifier()上迭代61, RandomForestClassifier()上迭代46和KNeighborsClassifier()上迭代109。
使用模型 (Putting the Models to Use)
To reinstantiate a model, we leverage our Machine() object’s built-in method BayesOptimClassifierBuilder(). To use this method, we pass in our results log, specify an estimator class and iteration number. This will instantiate a model object with the parameters stored on that record of the log:
為了重新實(shí)例化模型,我們利用了Machine()對(duì)象的內(nèi)置方法BayesOptimClassifierBuilder() 。 要使用此方法,我們傳入結(jié)果日志,指定一個(gè)估計(jì)器類和迭代數(shù)。 這將使用存儲(chǔ)在日志記錄中的參數(shù)實(shí)例化模型對(duì)象:
Here we see the model parameters:
在這里,我們看到模型參數(shù):
The models instantiated with BayesOptimClassifierBuilder() use .fit() and .predict() in a way that should feel quite familiar.
與實(shí)例化的模型BayesOptimClassifierBuilder()使用.fit()和.predict()的方式,應(yīng)該感到很熟悉。
Let’s finish this article with a very basic model performance evaluation. We will fit this RandomForestClassifier() on the training data and labels, generate predictions with the training data, and evaluate the model’s performance by comparing these predictions to the ground-truth:
讓我們以一個(gè)非常基本的模型性能評(píng)估結(jié)束本文。 我們將將此RandomForestClassifier()擬合到訓(xùn)練數(shù)據(jù)和標(biāo)簽上,使用訓(xùn)練數(shù)據(jù)生成預(yù)測,并通過將這些預(yù)測與真實(shí)性進(jìn)行比較來評(píng)估模型的性能:
Anyone familiar with Scikit-learn should feel right at home.
任何熟悉Scikit學(xué)習(xí)的人都應(yīng)該感到賓至如歸。
收盤時(shí) (In Closing)
mlmachine makes it easy to efficiently optimize the hyperparameters for multiple estimators in one shot, and facilitates the visual inspection of model improvement and parameter selection.
mlmachine使您可以輕松高效地一次優(yōu)化多個(gè)估計(jì)器的超參數(shù),并有助于對(duì)模型改進(jìn)和參數(shù)選擇進(jìn)行直觀檢查。
Check out the GitHub repository, and stay tuned for additional column entries.
簽出GitHub存儲(chǔ)庫 ,并繼續(xù)關(guān)注其他列條目。
翻譯自: https://towardsdatascience.com/mlmachine-hyperparameter-tuning-with-bayesian-optimization-2de81472e6d
超參數(shù)優(yōu)化 貝葉斯優(yōu)化框架
總結(jié)
以上是生活随笔為你收集整理的超参数优化 贝叶斯优化框架_mlmachine-使用贝叶斯优化进行超参数调整的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 吞食天地2赵子龙传攻略是什么
- 下一篇: 使用线性回归的预测建模