【Task5(2天)】模型调参
生活随笔
收集整理的這篇文章主要介紹了
【Task5(2天)】模型调参
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
- 使用網格搜索法對5個模型進行調優(調參時采用五折交叉驗證的方式),并進行模型評估,記得展示代碼的運行結果。 時間:2天
?
1.利用GGridSearchCV調參
1.1參數選擇
首先選擇5個模型要調的參數,這里是根據以前在知乎看的一張圖片(感謝大佬!)
parameters_log = {'C':[0.001,0.01,0.1,1,10]} parameters_svc = {'C':[0.001,0.01,0.1,1,10]} #這兩個模型本來分數就不行,就少選擇寫參數來搜索 parameters_tree = {'max_depth':[5,8,15,25,30,None],'min_samples_leaf':[1,2,5,10], 'min_samples_split':[2,5,10,15]} parameters_forest = {'max_depth':[5,8,15,25,30,None],'min_samples_leaf':[1,2,5,10], 'min_samples_split':[2,5,10,15],'n_estimators':[7,8,9,10]} #這兩個模型過擬合很厲害,參數多點 parameters_xgb = {'gamma':[0,0.05,0.1,0.3,0.5],'learning_rate':[0.01,0.015,0.025,0.05,0.1],'max_depth':[3,5,7,9],'reg_alpha':[0,0.1,0.5,1.0]} #這個模型表現挺好,多調試一點 parameters_total = {'log_clf':parameters_log,'svc_clf':parameters_svc,'tree_clf':parameters_tree,'forest_clf':parameters_forest,'xgb_clf':parameters_xgb}?
1.2劃分驗證集
本來想用sklearn的模塊劃分的,但是好像不能傳入數組,就是手動劃分前1000個樣本
X_val = X_train_scaled[:1000] y_val = y_train[:1000]1.3模型用字典集合
from sklearn.model_selection import GridSearchCV def gridsearch(X_val,y_val,models,parameters_total):models_grid = {}for model in models:grid_search = GridSearchCV(models[model],param_grid=parameters_total[model],n_jobs=-1,cv=5,verbose=10)grid_search.fit(X_val,y_val)models_grid[model] = grid_search.best_estimator_return models_grid1.4查看參數
models_grid {'log_clf': LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,intercept_scaling=1, max_iter=100, multi_class='warn',n_jobs=None, penalty='l2', random_state=None, solver='warn',tol=0.0001, verbose=0, warm_start=False),'svc_clf': SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3, gamma='auto_deprecated',kernel='rbf', max_iter=-1, probability=False, random_state=None,shrinking=True, tol=0.001, verbose=False),'tree_clf': DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=5,max_features=None, max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None,min_samples_leaf=5, min_samples_split=2,min_weight_fraction_leaf=0.0, presort=False, random_state=None,splitter='best'),'forest_clf': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',max_depth=15, max_features='auto', max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None,min_samples_leaf=10, min_samples_split=2,min_weight_fraction_leaf=0.0, n_estimators=7, n_jobs=None,oob_score=False, random_state=None, verbose=0,warm_start=False),'xgb_clf': XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,colsample_bytree=1, gamma=0.5, learning_rate=0.05, max_delta_step=0,max_depth=5, min_child_weight=1, missing=None, n_estimators=100,n_jobs=1, nthread=None, objective='binary:logistic', random_state=0,reg_alpha=1.0, reg_lambda=1, scale_pos_weight=1, seed=None,silent=True, subsample=1)}
2.參數優化前后對比
models_grid = gridsearch(X_val,y_val,models,parameters_total) results_test_grid,results_train_grid = metrics(models_grid,X_train_scaled,X_test_scaled,y_train,y_test)左邊優化前,右邊優化后
訓練集上:
測試集上:
可以看到明顯的防止了樹模型的過擬合,但是其他評估數據提升不是很大!!
看一下ROC曲線對比
左邊優化前,右邊優化后
?
?
?
轉載于:https://www.cnblogs.com/Hero1best/p/10891398.html
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的【Task5(2天)】模型调参的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 吴裕雄 Bootstrap 前端框架开发
- 下一篇: progress组件(进度条)