日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【机器学习基础】你应该知道的LightGBM各种操作!

發布時間:2025/3/8 编程问答 50 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【机器学习基础】你应该知道的LightGBM各种操作! 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

LightGBM是基于XGBoost的一款可以快速并行的樹模型框架,內部集成了多種集成學習思路,在代碼實現上對XGBoost的節點劃分進行了改進,內存占用更低訓練速度更快。

LightGBM官網:https://lightgbm.readthedocs.io/en/latest/

參數介紹:https://lightgbm.readthedocs.io/en/latest/Parameters.html

本文內容如下,原始代碼獲取方式見文末。

  • 1 安裝方法

  • 2 調用方法

    • 2.1 定義數據集

    • 2.2 模型訓練

    • 2.3 模型保存與加載

    • 2.4 查看特征重要性

    • 2.5 繼續訓練

    • 2.6 動態調整模型超參數

    • 2.7 自定義損失函數

  • 2.8 調參方法

    • 人工調參

    • 網格搜索

    • 貝葉斯優化

1 安裝方法

LightGBM的安裝非常簡單,在Linux下很方便的就可以開啟GPU訓練。可以優先選用從pip安裝,如果失敗再從源碼安裝。

  • 安裝方法:從源碼安裝

git?clone?--recursive?https://github.com/microsoft/LightGBM?;? cd?LightGBM mkdir?build?;?cd?build cmake?..#?開啟MPI通信機制,訓練更快 #?cmake?-DUSE_MPI=ON?..#?GPU版本,訓練更快 #?cmake?-DUSE_GPU=1?.. make?-j4
  • 安裝方法:pip安裝

#?默認版本 pip?install?lightgbm#?MPI版本 pip?install?lightgbm?--install-option=--mpi#?GPU版本 pip?install?lightgbm?--install-option=--gpu

2 調用方法

在Python語言中LightGBM提供了兩種調用方式,分為為原生的API和Scikit-learn API,兩種方式都可以完成訓練和驗證。當然原生的API更加靈活,看個人習慣來進行選擇。

2.1 定義數據集

df_train?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.train',?header=None,?sep='\t') df_test?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.test',?header=None,?sep='\t') W_train?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.train.weight',?header=None)[0] W_test?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.test.weight',?header=None)[0]y_train?=?df_train[0] y_test?=?df_test[0] X_train?=?df_train.drop(0,?axis=1) X_test?=?df_test.drop(0,?axis=1) num_train,?num_feature?=?X_train.shape#?create?dataset?for?lightgbm #?if?you?want?to?re-use?data,?remember?to?set?free_raw_data=Falselgb_train?=?lgb.Dataset(X_train,?y_train,weight=W_train,?free_raw_data=False)lgb_eval?=?lgb.Dataset(X_test,?y_test,?reference=lgb_train,weight=W_test,?free_raw_data=False)

2.2 模型訓練

params?=?{'boosting_type':?'gbdt','objective':?'binary','metric':?'binary_logloss','num_leaves':?31,'learning_rate':?0.05,'feature_fraction':?0.9,'bagging_fraction':?0.8,'bagging_freq':?5,'verbose':?0 }#?generate?feature?names feature_name?=?['feature_'?+?str(col)?for?col?in?range(num_feature)] gbm?=?lgb.train(params,lgb_train,num_boost_round=10,valid_sets=lgb_train,??#?eval?training?datafeature_name=feature_name,categorical_feature=[21])

2.3 模型保存與加載

#?save?model?to?file gbm.save_model('model.txt')print('Dumping?model?to?JSON...') model_json?=?gbm.dump_model()with?open('model.json',?'w+')?as?f:json.dump(model_json,?f,?indent=4)

2.4 查看特征重要性

#?feature?names print('Feature?names:',?gbm.feature_name())#?feature?importances print('Feature?importances:',?list(gbm.feature_importance()))

2.5 繼續訓練

#?continue?training #?init_model?accepts: #?1.?model?file?name #?2.?Booster() gbm?=?lgb.train(params,lgb_train,num_boost_round=10,init_model='model.txt',valid_sets=lgb_eval) print('Finished?10?-?20?rounds?with?model?file...')

2.6 動態調整模型超參數

#?decay?learning?rates #?learning_rates?accepts: #?1.?list/tuple?with?length?=?num_boost_round #?2.?function(curr_iter) gbm?=?lgb.train(params,lgb_train,num_boost_round=10,init_model=gbm,learning_rates=lambda?iter:?0.05?*?(0.99?**?iter),valid_sets=lgb_eval) print('Finished?20?-?30?rounds?with?decay?learning?rates...')#?change?other?parameters?during?training gbm?=?lgb.train(params,lgb_train,num_boost_round=10,init_model=gbm,valid_sets=lgb_eval,callbacks=[lgb.reset_parameter(bagging_fraction=[0.7]?*?5?+?[0.6]?*?5)]) print('Finished?30?-?40?rounds?with?changing?bagging_fraction...')

2.7 自定義損失函數

#?self-defined?objective?function #?f(preds:?array,?train_data:?Dataset)?->?grad:?array,?hess:?array #?log?likelihood?loss def?loglikelihood(preds,?train_data):labels?=?train_data.get_label()preds?=?1.?/?(1.?+?np.exp(-preds))grad?=?preds?-?labelshess?=?preds?*?(1.?-?preds)return?grad,?hess#?self-defined?eval?metric #?f(preds:?array,?train_data:?Dataset)?->?name:?string,?eval_result:?float,?is_higher_better:?bool #?binary?error #?NOTE:?when?you?do?customized?loss?function,?the?default?prediction?value?is?margin #?This?may?make?built-in?evalution?metric?calculate?wrong?results #?For?example,?we?are?doing?log?likelihood?loss,?the?prediction?is?score?before?logistic?transformation #?Keep?this?in?mind?when?you?use?the?customization def?binary_error(preds,?train_data):labels?=?train_data.get_label()preds?=?1.?/?(1.?+?np.exp(-preds))return?'error',?np.mean(labels?!=?(preds?>?0.5)),?Falsegbm?=?lgb.train(params,lgb_train,num_boost_round=10,init_model=gbm,fobj=loglikelihood,feval=binary_error,valid_sets=lgb_eval) print('Finished?40?-?50?rounds?with?self-defined?objective?function?and?eval?metric...')

2.8 調參方法

人工調參

For Faster Speed

  • Use bagging by setting bagging_fraction and bagging_freq

  • Use feature sub-sampling by setting feature_fraction

  • Use small max_bin

  • Use save_binary to speed up data loading in future learning

  • Use parallel learning, refer to Parallel Learning Guide <./Parallel-Learning-Guide.rst>__

For Better Accuracy

  • Use large max_bin (may be slower)

  • Use small learning_rate with large num_iterations

  • Use large num_leaves (may cause over-fitting)

  • Use bigger training data

  • Try dart

Deal with Over-fitting

  • Use small max_bin

  • Use small num_leaves

  • Use min_data_in_leaf and min_sum_hessian_in_leaf

  • Use bagging by set bagging_fraction and bagging_freq

  • Use feature sub-sampling by set feature_fraction

  • Use bigger training data

  • Try lambda_l1, lambda_l2 and min_gain_to_split for regularization

  • Try max_depth to avoid growing deep tree

  • Try extra_trees

  • Try increasing path_smooth

網格搜索

lg?=?lgb.LGBMClassifier(silent=False) param_dist?=?{"max_depth":?[4,5,?7],"learning_rate"?:?[0.01,0.05,0.1],"num_leaves":?[300,900,1200],"n_estimators":?[50,?100,?150]}grid_search?=?GridSearchCV(lg,?n_jobs=-1,?param_grid=param_dist,?cv?=?5,?scoring="roc_auc",?verbose=5) grid_search.fit(train,y_train) grid_search.best_estimator_,?grid_search.best_score_

貝葉斯優化

import?warnings import?time warnings.filterwarnings("ignore") from?bayes_opt?import?BayesianOptimization def?lgb_eval(max_depth,?learning_rate,?num_leaves,?n_estimators):params?=?{"metric"?:?'auc'}params['max_depth']?=?int(max(max_depth,?1))params['learning_rate']?=?np.clip(0,?1,?learning_rate)params['num_leaves']?=?int(max(num_leaves,?1))params['n_estimators']?=?int(max(n_estimators,?1))cv_result?=?lgb.cv(params,?d_train,?nfold=5,?seed=0,?verbose_eval?=200,stratified=False)return?1.0?*?np.array(cv_result['auc-mean']).max()lgbBO?=?BayesianOptimization(lgb_eval,?{'max_depth':?(4,?8),'learning_rate':?(0.05,?0.2),'num_leaves'?:?(20,1500),'n_estimators':?(5,?200)},?random_state=0)lgbBO.maximize(init_points=5,?n_iter=50,acq='ei') print(lgbBO.max)

獲取本文代碼,可以在作者公眾號“datawhale”后臺回復【lgb】,即可獲取本文的代碼Notebook!

往期精彩回顧適合初學者入門人工智能的路線及資料下載機器學習及深度學習筆記等資料打印機器學習在線手冊深度學習筆記專輯《統計學習方法》的代碼復現專輯 AI基礎下載機器學習的數學基礎專輯

獲取一折本站知識星球優惠券,復制鏈接直接打開:

https://t.zsxq.com/y7uvZF6

本站qq群704220115。

加入微信群請掃碼:

總結

以上是生活随笔為你收集整理的【机器学习基础】你应该知道的LightGBM各种操作!的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。