當前位置：首頁 >

xgboost参数_XGBoost实战和参数详解

發布時間：2025/3/15 65 豆豆

生活随笔收集整理的這篇文章主要介紹了 xgboost参数_XGBoost实战和参数详解小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

xgboost優點

正則化
并行處理？
靈活性，支持自定義目標函數和損失函數，二階可導
缺失值的處理
剪枝，不容易過擬合
內置了交叉驗證

參數的設置

params = {'booster': 'gbtree', 'objective': 'multi:softmax', # 多分類的問題'num_class': 10, # 類別數，與 multisoftmax 并用'gamma': 0.1, # 用于控制是否后剪枝的參數,越大越保守，一般0.1、0.2這樣子。'max_depth': 12, # 構建樹的深度，越大越容易過擬合'lambda': 2, # 控制模型復雜度的權重值的L2正則化項參數，參數越大，模型越不容易過擬合。'subsample': 0.7, # 隨機采樣訓練樣本'colsample_bytree': 0.7, # 生成樹時進行的列采樣'min_child_weight': 3,'silent': 1, # 設置成1則沒有運行信息輸出，最好是設置為0.'eta': 0.007, # 如同學習率'seed': 1000,'nthread': 4, # cpu 線程數 }

booster 默認是gbtree ,gblinear
slient 0是打印運行時的信息，1代表緘默方式運行
nthread 運行的線程數
num_pbuffer 緩存區的大小，訓練實例的數目，不需要人為進行設置
num_feature 特征的個數，自動進行設置

##############################################################################

eta 防止過擬合的更新步長 0.3
gamma 默認為0
max_depth 6 樹的最大深度
min_child_weight 默認是1 ，孩子節點中最小樣本的權重之和，小于該值，拆分結束
max_delta_step 0 每個數的權重被估計的值。通常設置為0，沒有約束。正數，跟新的過程更加保守，Lr中。樣本不均衡，可以設置為大于0的數
subsample 【depault=1】訓練模型的子樣本占整個樣本集合的比例。防止過采樣
colsample_btree 1 特征的采樣比例

#################################################################################

lambda 正則化l2的懲罰系數
alpha l1正則化的懲罰系數
lambda_bias 在偏智上的L2正則

#################################################################################

objective [ default=reg:linear ]
定義學習任務及相應的學習目標，可選的目標函數如下：
- “reg:linear” —— 線性回歸。
- “reg:logistic”—— 邏輯回歸。
- “binary:logistic”—— 二分類的邏輯回歸問題，輸出為概率。
- “binary:logitraw”—— 二分類的邏輯回歸問題，輸出的結果為wTx。
- “count:poisson”—— 計數問題的poisson回歸，輸出結果為poisson分布。在poisson回歸中，max_delta_step的缺省值為0.7。(used to safeguard optimization)
- “multi:softmax” –讓XGBoost采用softmax目標函數處理多分類問題，同時需要設置參數num_class（類別個數）
- “multi:softprob” –和softmax一樣，但是輸出的是ndata * nclass的向量，可以將該向量reshape成ndata行nclass列的矩陣。沒行數據表示樣本所屬于每個類別的概率。
- “rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss

base_score [ default=0.5 ]
- 所有實例的初始化預測分數，全局偏置；
- 為了足夠的迭代次數，改變這個值將不會有太大的影響。

eval_metric [ default according to objective ]
- 校驗數據所需要的評價指標，不同的目標函數將會有缺省的評價指標（rmse for regression, and error for classification, mean average precision for ranking）-
- 用戶可以添加多種評價指標，對于Python用戶要以list傳遞參數對給程序，而不是map參數list參數不會覆蓋’eval_metric’
- 可供的選擇如下:
  - “rmse”: root mean square error
  - “logloss”: negative log-likelihood
  - “error”: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
  - “merror”: Multiclass classification error rate. It is calculated as #(wrongcases)#(allcases).
  - “mlogloss”: Multiclass logloss
  - “auc”: Area under the curve for ranking evaluation.
  - “ndcg”:Normalized Discounted Cumulative Gain
  - “map”:Mean average precision
  - “ndcg@n”,”map@n”: n can be assigned as an integer to cut off the top positions in the lists for evaluation.
  - “ndcg-“,”map-“,”ndcg@n-“,”map@n-“: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding “-” in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions. training repeatively
seed [ default=0 ]
- 隨機數的種子。缺省值為0

章華燕：史上最詳細的XGBoost實戰?zhuanlan.zhihu.com

參數調整

確定boosting參數，預先設定其他參數的初始值

max_depth = 5 min_child_weight = 1 gamma = 0 subsample,colsample_bytree = 0.8 scale_pos_weight = 1 cv 確定 n_estimators

網格搜索確定max_depth 和min_child_weight

確定gamma參數的調優

調整subsample和colsample_bytree 的參數

正則化參數的調優

降低學習速率

Dukey：【轉】XGBoost參數調優完全指南（附Python代碼）?zhuanlan.zhihu.com

總結

以上是生活随笔為你收集整理的xgboost参数_XGBoost实战和参数详解的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： videojs暂停时显示大按钮_紧急！西
下一篇： coco showanns不显示_coc