模型融合(stackingblending)
生活随笔
收集整理的這篇文章主要介紹了
模型融合(stackingblending)
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
1. blending
需要得到各個模型結(jié)果集的權(quán)重,然后再線性組合。
stacking的核心:在訓(xùn)練集上進(jìn)行預(yù)測,從而構(gòu)建更高層的學(xué)習(xí)器。
stacking訓(xùn)練過程:
1) 拆解訓(xùn)練集。將訓(xùn)練數(shù)據(jù)隨機且大致均勻的拆為m份
2)在拆解后的訓(xùn)練集上訓(xùn)練模型,同時在測試集上預(yù)測。利用m-1份訓(xùn)練數(shù)據(jù)進(jìn)行訓(xùn)練,預(yù)測剩余一份;在此過程進(jìn)行的同時,利用相同的m-1份數(shù)據(jù)訓(xùn)練,在真正的測試集上預(yù)測;如此重復(fù)m次,將訓(xùn)練集上m次結(jié)果疊加為1列,將測試集上m次結(jié)果取均值融合為1列。
3)使用k個分類器重復(fù)2過程。將分別得到k列訓(xùn)練集的預(yù)測結(jié)果,k列測試集預(yù)測結(jié)果。
4)訓(xùn)練3過程得到的數(shù)據(jù)。將k列訓(xùn)練集預(yù)測結(jié)果和訓(xùn)練集真實label進(jìn)行訓(xùn)練,將k列測試集預(yù)測結(jié)果作為測試集。
# -*- coding: utf-8 -*- import numpy as np from sklearn.model_selection import StratifiedKFold from sklearn.svm import SVC from sklearn.ensemble import RandomForestClassifier from sklearn.neighbors import KNeighborsClassifier import xgboost as xgb from sklearn.ensemble import ExtraTreesClassifier from sklearn.linear_model import LogisticRegressiondef load_data():passdef stacking(train_x, train_y, test):""" stackinginput: train_x, train_y, testoutput: test的預(yù)測值clfs: 5個一級分類器dataset_blend_train: 一級分類器的prediction, 二級分類器的train_xdataset_blend_test: 二級分類器的test"""# 5個一級分類器clfs = [SVC(C = 3, kernel="rbf"),RandomForestClassifier(n_estimators=100, max_features="log2", max_depth=10, min_samples_leaf=1, bootstrap=True, n_jobs=-1, random_state=1),KNeighborsClassifier(n_neighbors=15, n_jobs=-1),xgb.XGBClassifier(n_estimators=100, objective="binary:logistic", gamma=1, max_depth=10, subsample=0.8, nthread=-1, seed=1),ExtraTreesClassifier(n_estimators=100, criterion="gini", max_features="log2", max_depth=10, min_samples_split=2, min_samples_leaf=1,bootstrap=True, n_jobs=-1, random_state=1)]# 二級分類器的train_x, testdataset_blend_train = np.zeros((train_x.shape[0], len(clfs)), dtype=np.int)dataset_blend_test = np.zeros((test.shape[0], len(clfs)), dtype=np.int)# 5個分類器進(jìn)行8_folds預(yù)測n_folds = 8skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=1)for i,clf in enumerate(clfs):dataset_blend_test_j = np.zeros((test.shape[0], n_folds)) # 每個分類器的單次fold預(yù)測結(jié)果for j,(train_index,test_index) in enumerate(skf.split(train_x, train_y)):tr_x = train_x[train_index]tr_y = train_y[train_index]clf.fit(tr_x, tr_y)dataset_blend_train[test_index, i] = clf.predict(train_x[test_index])dataset_blend_test_j[:, j] = clf.predict(test)dataset_blend_test[:, i] = dataset_blend_test_j.sum(axis=1) // (n_folds//2 + 1)# 二級分類器進(jìn)行預(yù)測clf = LogisticRegression(penalty="l1", tol=1e-6, C=1.0, random_state=1, n_jobs=-1)clf.fit(dataset_blend_train, train_y)prediction = clf.predict(dataset_blend_test)return predictiondef main():(train_x, train_y, test) = load_data()prediction = stacking(train_x, train_y, test)return predictionif __name__ == "__main__":prediction = main()總結(jié)
以上是生活随笔為你收集整理的模型融合(stackingblending)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: xgboost相比传统gbdt有何不同?
- 下一篇: xgboost使用自定义的loss fu