當前位置：首頁 > 编程语言 > python >内容正文

python

ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(四)

發布時間：2025/3/21 python 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(四) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(四)

Step 3: Tune gamma步驟3：伽馬微調

Step 4: Tune subsample and colsample_bytree第4步：調整subsample和colsample_bytree

Step 5: Tuning Regularization Parameters步驟5：調整正則化參數

Step 6: Reducing Learning Rate第6步：降低學習率

???????尾注/End Notes

???????

原文題目：《Complete Guide to Parameter Tuning in XGBoost with codes in Python》
原文地址：https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
所有權為原文所有，本文只負責翻譯。

相關文章
ML之XGBoost：XGBoost算法模型(相關配圖)的簡介(XGBoost并行處理)、關鍵思路、代碼實現(目標函數/評價函數)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost：Kaggle神器XGBoost算法模型的簡介(資源)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(一)
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(二)
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(三)
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(四)

Step 3: Tune gamma
步驟3：伽馬微調

Now lets tune gamma value using the parameters already tuned above. Gamma?can take various values but I’ll check for 5 values here. You can go into more precise values as.
現在讓我們使用上面已經調整過的參數來調整gamma值。gamma可以取不同的值，但我在這里檢查5個值。您可以使用更精確的值。

param_test3 = {'gamma':[i/10.0 for i in range(0,5)] } gsearch3 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=140, max_depth=4,min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test3, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch3.fit(train[predictors],train[target]) gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_

This shows that our original value of gamma, i.e.?0 is the optimum one. Before proceeding, a good idea would be to re-calibrate the number of boosting rounds for the updated parameters.
這表明我們的伽瑪原值，即0是最佳值。在繼續之前，一個好主意是為更新的參數重新校準助boosting的數量。

xgb2 = XGBClassifier(learning_rate =0.1,n_estimators=1000,max_depth=4,min_child_weight=6,gamma=0,subsample=0.8,colsample_bytree=0.8,objective= 'binary:logistic',nthread=4,scale_pos_weight=1,seed=27) modelfit(xgb2, train, predictors)

Here, we can see the improvement in score. So the final parameters are:
在這里，我們可以看到分數的提高。所以最終參數是

max_depth:?4
min_child_weight: 6
gamma:?0

Step 4: Tune subsample and colsample_bytree
第4步：調整subsample和colsample_bytree

The next step would be try different subsample and colsample_bytree values. Lets do this in 2 stages as well and take values 0.6,0.7,0.8,0.9 for both to start with.
下一步將嘗試不同的子樣本和列樣本樹值。讓我們分兩個階段來完成這項工作，從0.6、0.7、0.8、0.9開始。

param_test4 = {'subsample':[i/10.0 for i in range(6,10)],'colsample_bytree':[i/10.0 for i in range(6,10)] } gsearch4 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test4, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch4.fit(train[predictors],train[target]) gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_

Here, we found?0.8 as the optimum value for both?subsample and colsample_bytree.?Now we should try values in 0.05 interval around these.
在這里，我們發現0.8是子樣本和colsample_bytree的最佳值。現在我們應該在0.05間隔內嘗試這些值。

param_test5 = {'subsample':[i/100.0 for i in range(75,90,5)],'colsample_bytree':[i/100.0 for i in range(75,90,5)] } gsearch5 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test5, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch5.fit(train[predictors],train[target])

Again we got the same values as before. Thus the optimum values are:
我們又得到了和以前一樣的值。因此，最佳值為：

subsample: 0.8
colsample_bytree: 0.8

Step 5: Tuning Regularization Parameters
???????步驟5：調整正則化參數

Next step is to apply regularization to?reduce overfitting. Though many people don’t use this parameters much as gamma provides a substantial way of controlling complexity. But we should always try it. I’ll tune ‘reg_alpha’ value here and leave it upto you to try different values of ‘reg_lambda’.
下一步是應用正則化來減少過擬合。雖然許多人不使用這個參數，因為gamma提供了一種控制復雜性的實質性方法。但我們應該經常嘗試。我將在這里調整“reg_alpha”值，并讓您嘗試不同的“reg_lambda”值。

param_test6 = {'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100] } gsearch6 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0.1, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test6, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch6.fit(train[predictors],train[target]) gsearch6.grid_scores_, gsearch6.best_params_, gsearch6.best_score_

We can see that?the CV score is less than the previous case. But the?values tried are?very widespread, we?should try values closer to the optimum here (0.01) to see if we get something better.
我們可以看到CV的分數低于前一個案例。但是嘗試的值非常廣泛，我們應該嘗試接近最佳值的值（0.01），看看我們是否能得到更好的結果。

param_test7 = {'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05] } gsearch7 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0.1, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test7, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch7.fit(train[predictors],train[target]) gsearch7.grid_scores_, gsearch7.best_params_, gsearch7.best_score_

You can see that we got a better CV. Now we can apply this regularization in the model and look at the impact:
你可以看到我們有更好的CV。現在，我們可以在模型中應用此正則化，并查看影響：

xgb3 = XGBClassifier(learning_rate =0.1,n_estimators=1000,max_depth=4,min_child_weight=6,gamma=0,subsample=0.8,colsample_bytree=0.8,reg_alpha=0.005,objective= 'binary:logistic',nthread=4,scale_pos_weight=1,seed=27) modelfit(xgb3, train, predictors)

Again we can see slight improvement in the score.
我們可以再次看到分數略有提高。
?

Step 6: Reducing Learning Rate
第6步：降低學習率

Lastly, we should lower the learning rate and add more trees. Lets use the?cv function of XGBoost to do the job again.
最后，我們應該降低學習率，增加更多的樹。讓我們再次使用xgboost的cv功能來完成這項工作。

xgb4 = XGBClassifier(learning_rate =0.01,n_estimators=5000,max_depth=4,min_child_weight=6,gamma=0,subsample=0.8,colsample_bytree=0.8,reg_alpha=0.005,objective= 'binary:logistic',nthread=4,scale_pos_weight=1,seed=27) modelfit(xgb4, train, predictors)

Now we can see a significant boost in performance and the effect of parameter tuning is clearer.
???????現在我們可以看到性能的顯著提高，參數調整的效果也更加明顯。

As we come to the end, I would like to share?2 key thoughts:
最后，我想分享2個關鍵思想：

It is?difficult to get a very big leap?in performance by just using?parameter tuning?or?slightly better models. The max score for GBM was 0.8487 while XGBoost gave 0.8494. This is a decent improvement but not something very substantial.
僅僅使用參數調整或稍好的型號，很難在性能上獲得很大的飛躍。GBM最高得分為0.8487，XGBoost最高得分為0.8494。這是一個不錯的改進，但不是很實質的改進。

A significant jump can be obtained by other methods?like?feature engineering, creating?ensemble?of models,?stacking, etc
通過其他方法，如特征工程、創建模型集成、疊加等，可以獲得顯著的提升。

You can also download the iPython notebook with all these model codes from my?GitHub account. For codes in R, you can refer to?this article.
您也可以從我的Github帳戶下載包含所有這些型號代碼的ipython筆記本。有關R中的代碼，請參閱本文。

???????尾注/End Notes

This article was based on developing a XGBoost?model?end-to-end. We started with discussing?why XGBoost has superior performance over GBM?which was followed by detailed discussion on the?various parameters?involved. We also defined a generic function which you can re-use for making models.
本文基于開發一個xgboost模型端到端。我們首先討論了xgboost為什么比gbm有更好的性能，然后詳細討論了所涉及的各種參數。我們還定義了一個通用函數，您可以使用它來創建模型。

Finally, we discussed the?general approach?towards tackling a problem with XGBoost?and also worked out?the?AV Data Hackathon 3.x problem?through that approach.
最后，我們討論了解決xgboost問題的一般方法，并通過該方法解決了av data hackathon 3.x問題。

I hope you found this useful and now you feel more confident to?apply XGBoost?in solving a?data science problem. You can try this out in out upcoming hackathons.
我希望您發現這一點很有用，現在您對應用XGBoost解決數據科學問題更有信心。你可以在即將到來的黑客攻擊中嘗試一下。

Did you like this article? Would you like to share some other?hacks which you implement while making XGBoost?models? Please feel free to drop a note in the comments below and I’ll be glad to discuss.
???????你喜歡這篇文章嗎？您是否愿意分享一些其他的黑客，在制作XGBoost模型時您實現這些黑客？請在下面的評論中留言，我很樂意與您討論。

You want to apply your analytical skills and test your potential? Then?participate in our Hackathons?and compete with Top?Data Scientists from all over the world.
你想運用你的分析能力來測試你的潛力嗎？然后參與我們的黑客活動并與來自世界各地的頂尖數據科學家競爭。