日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

ML之XGBoost:XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(三)

發布時間:2025/3/21 python 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 ML之XGBoost:XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(三) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(三)

?

?

?

?

目錄

3. 參數微調案例/Parameter Tuning with Example

參數微調的一般方法/General Approach for Parameter Tuning

Step 1: Fix learning rate and number of estimators for tuning tree-based parameters步驟1:固定學習率和基于樹的參數調整估計數

Step 2: Tune max_depth and min_child_weight第二步:調整max_depth 和min_child_weight


???????

?

?

?

?

?

原文題目:《Complete Guide to Parameter Tuning in XGBoost with codes in Python》
原文地址:https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
所有權為原文所有,本文只負責翻譯。

相關文章
ML之XGBoost:XGBoost算法模型(相關配圖)的簡介(XGBoost并行處理)、關鍵思路、代碼實現(目標函數/評價函數)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost:Kaggle神器XGBoost算法模型的簡介(資源)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(一)
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(二)
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(三)
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(四)

3. 參數微調案例/Parameter Tuning with Example

We will take the data set from Data Hackathon 3.x AV hackathon, same as that taken in the?GBM article. The details of the problem can be found on the?competition page. You can download the data set from?here. I have performed the following steps:
我們將從Data Hackathon 3.x AV Hackathon獲取數據集,與GBM文章中的數據集相同。問題詳情可在競賽頁面上找到。您可以從這里下載數據集。我已執行以下步驟:

  • City variable dropped because of too many categories
    由于類別太多,城市變量已刪除
  • DOB converted to Age | DOB dropped
    出生日期轉換為年齡出生日期下降
  • EMI_Loan_Submitted_Missing created which is 1 if EMI_Loan_Submitted was missing else 0 | Original variable EMI_Loan_Submitted dropped
    如果Emi_Loan_Submitted丟失,則為1,否則為0提交的原始變量Emi_Loan_已刪除。
  • EmployerName dropped because of too many categories
    由于類別太多,EmployerName被刪除
  • Existing_EMI imputed with 0 (median) since only 111 values were missing
    由于只缺少111個值,因此用0(中位數)輸入的現有EMI
  • Interest_Rate_Missing created which is 1 if Interest_Rate was missing else 0 | Original variable Interest_Rate dropped
    如果利率丟失,則創建利率丟失,如果利率丟失,則為1;否則0原始可變利率丟失
  • Lead_Creation_Date dropped because made little intuitive impact on outcome
    由于對結果的直觀影響很小,導致潛在客戶創建日期下降。
  • Loan_Amount_Applied, Loan_Tenure_Applied imputed with median values
    貸款額,貸款期限
  • Loan_Amount_Submitted_Missing created which is 1 if Loan_Amount_Submitted was missing else 0 | Original variable Loan_Amount_Submitted dropped
    如果Loan_Amount_Submitted缺失,則為1;否則0原始可變貸款_Amount_Submitted已刪除。
  • Loan_Tenure_Submitted_Missing created which is 1 if Loan_Tenure_Submitted was missing else 0 | Original variable Loan_Tenure_Submitted dropped
    如果Loan_Perioration_Submitted缺失,則為1;否則0_Original Variable Loan_Perioration_Submitted已刪除
  • LoggedIn, Salary_Account dropped
    工資賬戶被刪除
  • Processing_Fee_Missing created which is 1 if Processing_Fee was missing else 0 | Original variable Processing_Fee dropped
    處理費丟失已創建,如果處理費丟失則為1,否則為0原始變量處理費丟失
  • Source – top 2 kept as is and all others combined into different category
    資料來源——前2名保持原樣,其他所有人合并為不同類別
  • Numerical and One-Hot-Coding performed
    數字化和獨熱編碼
  • For those who have the original data from competition, you can check out these steps from the data_preparation?iPython notebook in the repository.
    對于那些擁有來自競爭對手的原始數據的人,您可以從存儲庫中的“數據準備”ipython筆記本中查看這些步驟。

    Lets start by importing the required libraries and loading the data: ??首先導入所需的庫并加載數據:

    #Import libraries: import pandas as pd import numpy as np import xgboost as xgb from xgboost.sklearn import XGBClassifier from sklearn import cross_validation, metrics #Additional scklearn functions from sklearn.grid_search import GridSearchCV #Perforing grid searchimport matplotlib.pylab as plt %matplotlib inline from matplotlib.pylab import rcParams rcParams['figure.figsize'] = 12, 4train = pd.read_csv('train_modified.csv') target = 'Disbursed' IDcol = 'ID'

    Note that I have imported 2 forms of XGBoost:
    請注意,我導入了兩種XGBoost形式:

  • xgb?– this is the direct xgboost library. I will use a specific function “cv” from this library
    xgb–這是直接xgboost庫。我將使用這個庫中的特定函數“cv”
  • XGBClassifier?– this is an sklearn wrapper for XGBoost. This allows us to use sklearn’s Grid Search with parallel processing in?the same way we did for GBM
    XGBClassifier–這是XGBoost的sklearn包裝。這使得我們可以使用sklearn的網格搜索和并行處理就像我們對gbm所做的那樣
  • Before proceeding further, lets define a function which will help us create XGBoost?models and perform cross-validation. The best part is that you can take this function as it is and use it later for your own models.
    在繼續之前,我們先定義一個函數,它將幫助我們創建xgboost模型并執行交叉驗證。最好的一點是,您可以將此函數按原樣使用,稍后將其用于您自己的模型。

    def modelfit(alg, dtrain, predictors,useTrainCV=True, cv_folds=5, early_stopping_rounds=50):if useTrainCV:xgb_param = alg.get_xgb_params()xgtrain = xgb.DMatrix(dtrain[predictors].values, label=dtrain[target].values)cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], nfold=cv_folds,metrics='auc', early_stopping_rounds=early_stopping_rounds, show_progress=False)alg.set_params(n_estimators=cvresult.shape[0])#Fit the algorithm on the dataalg.fit(dtrain[predictors], dtrain['Disbursed'],eval_metric='auc')#Predict training set:dtrain_predictions = alg.predict(dtrain[predictors])dtrain_predprob = alg.predict_proba(dtrain[predictors])[:,1]#Print model report:print "\nModel Report"print "Accuracy : %.4g" % metrics.accuracy_score(dtrain['Disbursed'].values, dtrain_predictions)print "AUC Score (Train): %f" % metrics.roc_auc_score(dtrain['Disbursed'], dtrain_predprob)feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)feat_imp.plot(kind='bar', title='Feature Importances')plt.ylabel('Feature Importance Score')

    This code is slightly different from what I used for GBM. The focus of this article is to cover the concepts and not coding.?Please feel free to drop a note in the comments if you find any challenges in understanding any part of it. Note that xgboost’s sklearn wrapper doesn’t have a “feature_importances” metric but a get_fscore() function which does the same job.
    此代碼與我用于GBM的代碼稍有不同。本文的重點是涵蓋概念而不是編碼。如果您在理解其中的任何部分時發現任何挑戰,請隨時在評論中留言。請注意,xgboost的sklearn包裝器沒有“feature-importances”度量標準,而是有一個get-fscore()函數,它執行相同的工作。
    ?

    ?

    參數微調的一般方法/General Approach for Parameter Tuning

    We will use an?approach similar to that of GBM here. The various steps to be?performed are:
    我們將使用類似于GBM的方法。要執行的各種步驟包括:

  • Choose a relatively?high learning rate. Generally a learning rate?of 0.1 works but somewhere between 0.05 to 0.3 should work for different problems.?Determine the?optimum number of trees for this learning rate. XGBoost has a very useful function called as “cv” which performs cross-validation at each boosting iteration and thus returns the optimum number of trees required.
    選擇相對較高的學習率。一般來說,學習率為0.1有效,但在0.05到0.3之間的某個值應適用于不同的問題。確定此學習率的最佳樹數。XGBoost有一個非常有用的函數叫做“cv”,它在每次提升迭代時執行交叉驗證,從而返回所需的最佳樹數。
  • Tune tree-specific parameters?( max_depth, min_child_weight, gamma, subsample, colsample_bytree) for decided learning rate and number of trees. Note that we can choose different parameters to define a tree and I’ll take up an example here.
    調整特定于樹的參數(最大深度、最小子樹重量、gamma、子樣本、colsample字節樹),以確定學習速率和樹數。注意,我們可以選擇不同的參數來定義一個樹,我將在這里舉一個例子。
  • Tune?regularization parameters?(lambda, alpha) for xgboost which can help reduce model complexity and enhance performance.
    調整xgboost的正則化參數(lambda,alpha),這有助于降低模型復雜性和提高性能。
  • Lower the learning rate?and decide the optimal parameters?.
    降低學習率,確定最佳參數。
  • Let us look at a more detailed step by step approach.
    讓我們看一個更詳細的逐步方法。

    ?

    Step 1: Fix learning rate and number of estimators for tuning tree-based parameters
    步驟1:固定學習率和基于樹的參數調整估計數

    In order to decide on boosting parameters, we need to set some initial values of other parameters. Lets take the following values:
    為了確定 boosting參數,我們需要設置其他參數的一些初始值。讓我們采用以下值:

  • max_depth?= 5?: This should be between?3-10.?I’ve started with 5 but you can choose a different number as well. 4-6 can be good starting points.
    最大深度=5:應該在3-10之間。我從5開始,但你也可以選擇不同的數字。4-6可以是很好的起點。
  • min_child_weight?= 1?: A smaller value is chosen because it is a highly imbalanced class problem and leaf nodes can have smaller size groups.
    min_child_weight=1:選擇較小的值是因為它是高度不平衡的類問題,葉節點可以有較小的大小組。
  • gamma?= 0?:?A smaller value like 0.1-0.2 can also be chosen for starting. This will anyways be tuned later.
    gamma=0:也可以選擇較小的值(如0.1-0.2)啟動。無論如何,稍后將對此進行調整。
  • subsample, colsample_bytree = 0.8?: This is a commonly used used start value. Typical values range between 0.5-0.9.
    subsample,colsample_bytree=0.8:這是常用的起始值。典型值在0.5-0.9之間。
  • scale_pos_weight = 1: Because?of high class imbalance.
    scale_pos_weight =1:因為等級不平衡。
  • Please note that all the above are just initial estimates and will be tuned later. Lets take the default learning rate of 0.1 here and check the optimum number of trees using cv function of xgboost. The function defined above will do it for us.
    請注意,以上只是初步估計,稍后將進行調整。讓我們在這里取默認的學習率0.1,并使用xgboost的cv函數檢查最佳樹數。上面定義的函數將為我們做這件事。

    #Choose all predictors except target & IDcols predictors = [x for x in train.columns if x not in [target, IDcol]] xgb1 = XGBClassifier(learning_rate =0.1,n_estimators=1000,max_depth=5,min_child_weight=1,gamma=0,subsample=0.8,colsample_bytree=0.8,objective= 'binary:logistic',nthread=4,scale_pos_weight=1,seed=27) modelfit(xgb1, train, predictors)

    As you can see that here we got 140?as the optimal estimators for 0.1 learning rate. Note that this value might be too high for you depending on the power of your system. In that case you can increase the learning rate and re-run the command to get the reduced number of estimators.
    如你所見,這里我們得到140作為0.1學習率的最佳估計量。請注意,根據系統的功率,此值可能對您來說太高。在這種情況下,您可以提高學習率,并重新運行該命令以獲得減少的估計數。

    Note: You will?see the test AUC as “AUC Score (Test)” in the?outputs here. But this would not appear if you try to run the command on your system as the data is not made public. It’s provided here just for reference. The part of the code which generates this output has been removed here.
    注意:您將在這里的輸出中將測試AUC視為“AUC得分(測試)”。但如果您試圖在系統上運行該命令,因為數據不會公開,則不會出現這種情況。這里提供僅供參考。生成此輸出的代碼部分已在此處刪除。

    ?

    Step 2: Tune max_depth and min_child_weight
    ???????第二步:調整max_depth 和min_child_weight

    We tune these first as they will have the highest impact on model outcome.?To start with, let’s set wider ranges and then we will perform another?iteration for smaller ranges.
    我們首先對它們進行調整,因為它們對模型結果的影響最大。首先,讓我們設置更寬的范圍,然后對較小的范圍執行另一個迭代。

    Important Note:?I’ll be doing some heavy-duty grid searched in this section which can take 15-30 mins or even more time to run depending on your system. You can vary the number of values you are testing based on what your system can handle.
    重要提示:我將在本節中搜索一些重載網格,根據您的系統,這可能需要15-30分鐘甚至更長的時間來運行。您可以根據系統可以處理的內容改變正在測試的值的數量。

    param_test1 = {'max_depth':range(3,10,2),'min_child_weight':range(1,6,2) } gsearch1 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=140, max_depth=5,min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27), param_grid = param_test1, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch1.fit(train[predictors],train[target]) gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_

    Here, we have run 12?combinations with wider intervals between values. The ideal values are?5?for max_depth?and?5?for min_child_weight. Lets go one step deeper and look for optimum values. We’ll search for values 1 above and below the optimum values because we took an interval of two.
    在這里,我們運行12組合,值之間的間隔更大。理想值為5?max_depth?和5min_child_weight。我們再深入一步,尋找最佳值。我們將在最佳值的上方和下方搜索值1,因為我們采用了兩個間隔。

    param_test2 = {'max_depth':[4,5,6],'min_child_weight':[4,5,6] } gsearch2 = GridSearchCV(estimator = XGBClassifier( learning_rate=0.1, n_estimators=140, max_depth=5,min_child_weight=2, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test2, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch2.fit(train[predictors],train[target]) gsearch2.grid_scores_, gsearch2.best_params_, gsearch2.best_score_

    Here, we get the optimum values as?4?for max_depth?and?6 for min_child_weight. Also, we can see the CV score increasing slightly. Note that as the model performance increases, it becomes exponentially difficult to achieve even marginal gains in performance. You would have noticed that here we got 6 as optimum?value for min_child_weight but we haven’t tried values more than 6. We can do that as follow:.
    在這里,我們得到了最大深度4和最小兒童體重6的最佳值。此外,我們還可以看到簡歷分數略有增加。請注意,隨著模型性能的提高,甚至很難獲得性能的邊際收益。你可能會注意到,這里我們有6個最小兒童體重的最佳值,但我們沒有嘗試超過6個。我們可以這樣做:。

    param_test2b = {'min_child_weight':[6,8,10,12] } gsearch2b = GridSearchCV(estimator = XGBClassifier( learning_rate=0.1, n_estimators=140, max_depth=4,min_child_weight=2, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test2b, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch2b.fit(train[predictors],train[target])modelfit(gsearch3.best_estimator_, train, predictors) gsearch2b.grid_scores_, gsearch2b.best_params_, gsearch2b.best_score_

    We see 6 as the optimal value.
    ???????我們認為6是最佳值。

    ?

    ?

    總結

    以上是生活随笔為你收集整理的ML之XGBoost:XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(三)的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

    主站蜘蛛池模板: av成人亚洲| 爱如潮水3免费观看日本高清 | 亚洲涩涩网站 | 国产欧美在线观看视频 | 99草视频| 国产视频你懂得 | 黑人乱码一区二区三区av | 黄色高清在线观看 | 污视频网站免费 | 欧美一级特黄视频 | 国产黄片一区二区三区 | 国产九九精品视频 | 懂色av一区二区夜夜嗨 | 国产精品一区二区av日韩在线 | 免费看黄色大片 | 久久久久久久久国产精品一区 | 亚洲综合精品国产一区二区三区 | 亚洲精品香蕉 | 黑人玩弄人妻一区二区三区影院 | 综合激情四射 | 国产一区二区av | 打屁股调教视频 | 毛片小视频| 成人亚洲国产 | 涩涩国产 | 国产美女无遮挡永久免费 | 玉女心经是什么意思 | 黄网站视频在线观看 | 尹人成人| 人妻一区二区三区四区五区 | 黄色免费网页 | 久在线视频| 久久av免费观看 | 亚洲一区视频在线 | 性调教学院高h学校 | 巨乳中文字幕 | 无罩大乳的熟妇正在播放 | 国产精品永久免费视频 | 久久九九免费视频 | 亚洲综合日韩在线 | 国产三级伦理片 | 日本少妇性生活 | 澳门黄色网 | 91微拍| 欧美女优一区 | 国产av电影一区二区三区 | 亚洲在线网站 | 美女免费福利视频 | 国产片黄色 | 中国美女洗澡免费看网站 | a天堂中文网 | 高hnp视频 | 蜜臀av首页 | 污污视频免费看 | 日韩五码在线 | 亚洲欧美精品午睡沙发 | av成人在线看 | 精品无码国产污污污免费网站 | 欧美日韩国产一区二区三区 | 天天干天天舔 | 一级片一级片 | 免费一级网站 | 91福利在线免费观看 | 色综合天天色综合 | 日韩资源在线 | 奇米精品一区二区三区在线观看一 | 国产一级淫片a | 久久无码国产视频 | 亚洲精品电影网 | 黄色视屏在线播放 | 性生交生活影碟片 | 美女网站av | 免费av免费看 | 亚洲精品无码永久在线观看 | 精品人伦一区二区 | 琪琪色影音先锋 | 日韩一区二区三区免费视频 | 告诉我真相俄剧在线观看 | 福利在线播放 | 91影院在线 | 亚洲国产乱 | 国产香蕉视频在线 | 国语对白真实视频播放 | www.18av| 天堂视频网| 视频在线一区二区三区 | 特级黄色录像 | 日韩城人免费 | a黄色片| 超碰97国产精品人人cao | 男人操女人动态图 | 欧美激情电影一区二区 | 国产做爰高潮呻吟视频 | 女女h百合无遮涩涩漫画软件 | 蜜桃精品视频在线 | 黄色av网站在线播放 | 欧美一级爽aaaaa大片 | 久久一二三区 | 秋霞无码一区二区 |