當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

AdaBoost算法特性

發布時間：2025/3/21 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 AdaBoost算法特性小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Boosting算法

提升算法是一種常見的統計學習方法,其作用為將弱的學習算法提升為強學習算法.其理論基礎為:強可學習器與弱可學習器是等價的.即在在學習中發現了’弱學習算法’,則可以通過某些方法將它特生為強可學習器,這是數學可證明的.在分類學習中提升算法通過反復修改訓練數據的權值分布,構建一系列的基本分類器(弱分類器),并將這些基本分類器線性組合,構成一個強學習器.代表算法為Adaboost算法,ada是自適應的Adaptive的縮寫.

Adaboost

原理

Adaboost的核心思想是通過反復修改數據的權重,從而使一系列弱學習器成為強可學習器.其核心步驟如下:
- 權值調整,提升被錯誤分類的樣本的權重,降低被正確分類的權重
- 基分類器組合,采用加權多數表決算法,加大分類誤差率較小的弱分類器的權重,減小誤差大的.

特性

Adaboost更加專注于偏差,他可以降低基學習器的偏差,對基學習器進行進一步的提升.
Adaboost的默認基學習器是決策樹,我們也可以過會會使用其他基學習器證明其對降低偏差的影響.
Adaboost的訓練誤差分析表明,Adaboost每次迭代可以減少它在訓練數據集上的分類誤差率,這說明了它作為提升方法的有效性.但是在實際使用中要注意方差-偏差困境,避免泛化能力的降低.

sklearn實現

sklearn中的實現一共有兩種,SAMME 與 SAMME.R:

SAMME使用預測錯誤的標簽進行調整,而SAMME則使用預測類別的概率進行調整.
SAMME.R收斂速度快,實現較低的測試錯誤需要的迭代次數更少
SAMME.R必須需要基學習器實現了返回類別的預測概率.SAMME則不需要
SAMME.R在較大學習率時對原始數據的擬合能力會出現降低,而SAMME不會
在這兩個算法中都存在早停(early_stop)機制,當到達內部的停止條件時,就會自動停止提升,從而出現Adaboost實際上的n_estimators小于,我們自己設置的n_estimators,這樣做有利于防止過擬合.

這些特性我們將會在后面進行相應的實驗證明這些特性.

進行回歸學習器的相關實驗,驗證理論特性

Adaboost可以對基學習器進行進一步提升

簡介

我們可以將之前在回歸決策樹中的例子稍加改造,我們依舊利用numpy生成sin正弦函數的數據,然后添加噪音.
這樣通過噪音我們就可以更加直觀地看到算法的擬合能力(偏差)以及過擬合現象(方差).

在下面的例子中,我們一共定義了了三個估計器,繪制了三張圖片.
其中前兩個估計器為決策樹,超參數使用了max_depth一個為2,一個為5,后兩個為以決策樹為基估計器的提升算法,max_depth都為2,但是控制超參數n_estimators不同也就是一個為10(提升了9次),另外一個為500(提升499次).

代碼及繪圖如下:

首先使用回歸類別的學習器進行實驗,在sklearn中的AdaBoostRegressor,并沒有SAMME與SAMME.R的區別.

%matplotlib inline import numpy as np from sklearn.tree import DecisionTreeRegressor import matplotlib.pyplot as plt from sklearn.ensemble import AdaBoostRegressor# Create a random dataset rng = np.random.RandomState(1) X = np.sort(10 * rng.rand(160, 1), axis=0) y = np.sin(X).ravel() y[::5] += 2 * (0.5 - rng.rand(int(len(X)/5))) # 每五個點增加一次噪音# Fit regression modelestimators_num = 500regr_1 = DecisionTreeRegressor(max_depth=2) regr_2 = DecisionTreeRegressor(max_depth=5) regr_3 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=5),n_estimators=10, random_state=rng) regr_4 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=5),n_estimators=estimators_num, random_state=rng) regr_1.fit(X, y) regr_2.fit(X, y) regr_3.fit(X, y) regr_4.fit(X, y)# Predict X_test = np.arange(0.0, 10.0, 0.01)[:, np.newaxis] y_test = np.sin(X_test).ravel() y_test[::5] += 2 * (0.5 - rng.rand(int(len(X_test)/5))) # 每五個點增加一次噪音 y_1 = regr_1.predict(X_test) y_2 = regr_2.predict(X_test) y_3 = regr_3.predict(X_test) y_4 = regr_4.predict(X_test)fig = plt.figure() fig.set_size_inches(18.5, 10.5) ax = fig.add_subplot(3, 1, 1) plt.scatter(X, y, s=20, edgecolor="black",c="darkorange", label="data") ax.plot(X_test, y_1, color="cornflowerblue",label="max_depth=2", linewidth=2) ax.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2) ax.plot(X_test, y_3, color="r", label="n_estimators=100", linewidth=2) ax.set_xlabel("data") ax.set_ylabel("target") ax.set_ylim(-1.5, 1.5) ax.set_xlim(-0.5, 10.5)ax = fig.add_subplot(3, 1, 2) plt.scatter(X, y, s=20, edgecolor="black",c="darkorange", label="data") ax.plot(X_test, y_3, color="r", label="n_estimators=10", linewidth=2) ax.plot(X_test, y_4, color="blue", label="n_estimators=1000", linewidth=2) ax.set_xlabel("data") ax.set_ylabel("target") ax.set_ylim(-1.5, 1.5) ax.set_xlim(-0.5, 10.5)regr_4_estimators_num = len(regr_4.estimators_) ax = fig.add_subplot(3, 1, 3) ax.plot(list(range(1, regr_4_estimators_num + 1))[:100], list(regr_4.staged_score(X, y))[:100], label="Traing score") ax.plot(list(range(1, regr_4_estimators_num + 1))[:100], list(regr_4.staged_score(X_test, y_test))[:100], label="Testing score") ax.set_xlabel("estimator num") ax.set_ylabel("score") ax.legend(loc="lower right") ax.set_ylim(0, 1) ax.set_xlim(1, 100) plt.show()

程序及繪圖解析

第一幅圖片中繪制了三張曲線,從圖片我們可以明顯看出,深度為5決策樹的擬合效果要好于深度為2的決策樹,而提升算法則大幅度強化了原來決策樹的擬合能力.

第二張圖片中則分別繪制了,n_estimators為10與n_estimators為500的AdaBoost算法(并且基估計器一致).可以看出n_estimators=500的AdaBoost估計器的擬合效果明顯要高于n_estimators=10的估計器,但是我們也可以直觀的看出后者相較于擬合了更多的噪音.

同時在第三幅圖繪制的訓練,測試曲線則是繪制了n_estimator=500的AdaBoost估計器在整個提升過程中(n_estimators=1到500的過程中)的訓練效果,可以看出雖然隨著n_estimators的增加確實可以使AdaBoost擁有更強的擬合能力,但是因為過擬合,測試成績卻也隨著擬合能力的提高,先(略微)提高,然后降低了.

限制

雖然AdaBoost可以提高基學習器的擬合能力,但這種提升也取決于基學習器本身,AdaBoost算法無法無限制的提升基學習器的擬合能力.以剛剛的代碼為例,假如將AdaBoost的基學習器的max_depth替換為2,那么AdaBoost將會很快早停.因此可見,在實際的使用中不僅僅是AdaBoost的超參數本身,基學習器的選擇也是十分重要的.

代碼以及演示效果:

regr_5 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=2),n_estimators=estimators_num, random_state=rng) regr_6 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=3),n_estimators=estimators_num, random_state=rng)regr_5.fit(X, y) regr_6.fit(X, y) y_5 = regr_3.predict(X_test) y_6 = regr_4.predict(X_test)print('regr_5的實際提升次數為%d'%len(regr_5.estimators_)) print('regr_6的實際提升次數為%d'%len(regr_5.estimators_)) # Plot the results plt.figure(figsize=(18.5, 10.5)) plt.scatter(X, y, s=20, edgecolor="black",c="darkorange", label="data") plt.plot(X_test, y_1, color="cornflowerblue",label="max_depth=2", linewidth=2) plt.plot(X_test, y_5, color="yellowgreen", label="Ada_max_depth=2", linewidth=2) plt.plot(X_test, y_6, color="r", label="Ada_max_depth=3", linewidth=2) # plt.plot(X_test, y_4, color="black", label="n_estimators=500", linewidth=2) plt.xlabel("data") plt.ylabel("target") plt.title("Decision Tree Regression") plt.legend() plt.show() regr_5的實際提升次數為10 regr_6的實際提升次數為10

通過繪制出的圖形,我們可以看出regr_5(黃綠色)的擬合效果要好于之前所有而學習器,不僅擁有很好的擬合能力,而且過擬合程度很低,在實際的使用過程中,擁有最佳擬合能力與泛化能力的學習器正是我們想要的結果

regr_5 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=2),n_estimators=estimators_num, random_state=rng)

學習率與損失函數

在AdaBoostRegressor比較重要的另外兩個超參數則是學習率與損失函數,學習率一般較小,這個請根據具體的CV進行判斷,而損失函數的差距則不大,但是因為其對最終結果也存在著一定的影響,請在正式的實驗中重視該超參數.
下面的代碼展示了不同損失函數時的訓練-測試誤差曲線:

from sklearn import datasets,model_selection diabetes = datasets.load_diabetes() # 使用 scikit-learn 自帶的一個糖尿病病人的數據集 X_train, X_test, y_train, y_test = model_selection.train_test_split(diabetes.data, diabetes.target,test_size=0.25, random_state=0) losses = ['linear', 'square', 'exponential'] fig = plt.figure() fig.set_size_inches(18.5, 10.5) ax = fig.add_subplot(1, 1, 1) for i, loss in enumerate(losses):regr = AdaBoostRegressor(loss=loss, n_estimators=30)regr.fit(X_train, y_train)## 繪圖estimators_num = len(regr.estimators_)X = range(1, estimators_num + 1)ax.plot(list(X), list(regr.staged_score(X_train, y_train)),label="Traing score:loss=%s" % loss)ax.plot(list(X), list(regr.staged_score(X_test, y_test)),label="Testing score:loss=%s" % loss)ax.set_xlabel("estimator num")ax.set_ylabel("score")ax.legend(loc="lower right")ax.set_ylim(-1, 1) plt.suptitle("AdaBoostRegressor") plt.show()

進行分類決策樹的相關實驗,證明理論特性

建立四種學習率下SAMME與SAMME.R兩種算法的比較.

from sklearn.ensemble import AdaBoostClassifier digits = datasets.load_digits() X_train, X_test, y_train, y_test = model_selection.train_test_split(digits.data, digits.target,test_size=0.25, random_state=0,stratify=digits.target) algorithms = ['SAMME.R', 'SAMME'] fig = plt.figure() fig.set_size_inches(18.5, 10.5) learning_rates = [0.05, 0.1, 0.5, 0.9] for i, learning_rate in enumerate(learning_rates):ax = fig.add_subplot(2, 2, i + 1)for i, algorithm in enumerate(algorithms):clf = AdaBoostClassifier(learning_rate=learning_rate,algorithm=algorithm)clf.fit(X_train, y_train)## 繪圖estimators_num = len(clf.estimators_)X = range(1, estimators_num + 1)ax.plot(list(X), list(clf.staged_score(X_train, y_train)),label="%s:Traing score" % algorithms[i])ax.plot(list(X), list(clf.staged_score(X_test, y_test)),label="%s:Testing score" % algorithms[i])ax.set_xlabel("estimator num")ax.set_ylabel("score")ax.legend(loc="lower right") # ax.set_title("learing rate:%f" % learning_rate) fig.suptitle("AdaBoostClassifier : learning_rates = [0.05, 0.1, 0.5, 0.9]") plt.show()

我們首先考慮從學習率的角度進行比較,非常明顯的是在這四幅圖中隨著學習率的增加,SAMME.R算法的成績出現了大幅度的下降.
而SAMME在更大的學習率下則指表現出了更快的提升速度(當然對于最終結果,學習率較小更好一些).

同時下面給出一個新的代碼,這個代碼可以更形象的對比出,學習率對這兩種算法的影響:

learning_rates = np.linspace(0.01, 1) fig = plt.figure(figsize=(18.5,10.5)) algorithms = ['SAMME.R', 'SAMME'] for i, algorithm in enumerate(algorithms):ax = fig.add_subplot(2, 1, i+1)traing_scores = []testing_scores = []for learning_rate in learning_rates:clf = AdaBoostClassifier(learning_rate=learning_rate, n_estimators=100,algorithm=algorithm)clf.fit(X_train, y_train)traing_scores.append(clf.score(X_train, y_train))testing_scores.append(clf.score(X_test, y_test))ax.plot(learning_rates, traing_scores, label="Traing score")ax.plot(learning_rates, testing_scores, label="Testing score")ax.set_xlabel("learning rate")ax.set_ylabel("score")ax.legend(loc="best")ax.set_title("AdaBoostClassifier") plt.show()

然后我們從兩種算法的收斂速度進行比較相比于SAMME,SAMME.R明顯擁有更快的收斂速度,同時獲取較低的錯誤率也只需要更少的訓練,此處可以參考sklearn的官方文檔同時在官網的文檔中還給出了另外一個例子,請看下面:

from sklearn.externals.six.moves import zip from sklearn.datasets import make_gaussian_quantiles from sklearn.metrics import accuracy_score from sklearn.tree import DecisionTreeClassifierX, y = make_gaussian_quantiles(n_samples=13000, n_features=10,n_classes=3, random_state=1)n_split = 3000X_train, X_test = X[:n_split], X[n_split:] y_train, y_test = y[:n_split], y[n_split:]bdt_real = AdaBoostClassifier(DecisionTreeClassifier(max_depth=2),n_estimators=600,learning_rate=1)bdt_discrete = AdaBoostClassifier(DecisionTreeClassifier(max_depth=2),n_estimators=600,learning_rate=1.5,algorithm="SAMME")bdt_real.fit(X_train, y_train) bdt_discrete.fit(X_train, y_train)real_test_errors = [] discrete_test_errors = []for real_test_predict, discrete_train_predict in zip(bdt_real.staged_predict(X_test), bdt_discrete.staged_predict(X_test)):real_test_errors.append(1. - accuracy_score(real_test_predict, y_test))discrete_test_errors.append(1. - accuracy_score(discrete_train_predict, y_test))n_trees_discrete = len(bdt_discrete) n_trees_real = len(bdt_real)# Boosting might terminate early, but the following arrays are always # n_estimators long. We crop them to the actual number of trees here: discrete_estimator_errors = bdt_discrete.estimator_errors_[:n_trees_discrete] real_estimator_errors = bdt_real.estimator_errors_[:n_trees_real] discrete_estimator_weights = bdt_discrete.estimator_weights_[:n_trees_discrete]plt.figure(figsize=(15, 5))plt.subplot(131) plt.plot(range(1, n_trees_discrete + 1),discrete_test_errors, c='black', label='SAMME') plt.plot(range(1, n_trees_real + 1),real_test_errors, c='black',linestyle='dashed', label='SAMME.R') plt.legend() plt.ylim(0.18, 0.62) plt.ylabel('Test Error') plt.xlabel('Number of Trees')plt.subplot(132) plt.plot(range(1, n_trees_discrete + 1), discrete_estimator_errors,"b", label='SAMME', alpha=.5) plt.plot(range(1, n_trees_real + 1), real_estimator_errors,"r", label='SAMME.R', alpha=.5) plt.legend() plt.ylabel('Error') plt.xlabel('Number of Trees') plt.ylim((.2,max(real_estimator_errors.max(),discrete_estimator_errors.max()) * 1.2)) plt.xlim((-20, len(bdt_discrete) + 20))plt.subplot(133) plt.plot(range(1, n_trees_discrete + 1), discrete_estimator_weights,"b", label='SAMME') plt.legend() plt.ylabel('Weight') plt.xlabel('Number of Trees') plt.ylim((0, discrete_estimator_weights.max() * 1.2)) plt.xlim((-20, n_trees_discrete + 20))# prevent overlapping y-axis labels plt.subplots_adjust(wspace=0.25) plt.show()

比較SAMME和SAMME.R [1]算法的性能。SAMME.R使用概率估計來更新可加模型，而SAMME只使用分類。如示例所示，SAMME.R算法通常比SAMME更快地收斂，通過更少的提升迭代實現更低的測試誤差。左側顯示每次升壓迭代后測試集上每種算法的誤差，中間顯示每棵樹測試集上的分類錯誤，右側顯示每棵樹的升壓權重。所有的樹在SAMME.R算法中的權重都是1，因此不顯示.

同時在里還有另外一個例子很好的展示了,AdaBoost對于不同基學習器的提升的影響.其對于強學習器的提升較少也較慢,但是對弱學習器的提升則更為明顯.在下面的例子中,使用的第一個決策樹基學習器的是’弱學習器’,AdaBoost對其性能的提升速度很快,而第二個學習器’高斯樸素貝葉斯分類器’在該數據集上則表現為’強學習器’,因此AdaBoost對其的提升要相對慢得多.

from sklearn.naive_bayes import GaussianNBdigits = datasets.load_digits() X_train, X_test, y_train, y_test = model_selection.train_test_split(digits.data, digits.target,test_size=0.25, random_state=0,stratify=digits.target)fig = plt.figure(figsize=(18.5,10.5)) ax = fig.add_subplot(2, 1, 1) ########### 默認的個體分類器 ############# clf = AdaBoostClassifier(learning_rate=0.1) clf.fit(X_train, y_train) ## 繪圖 estimators_num = len(clf.estimators_) X = range(1, estimators_num + 1) ax.plot(list(X), list(clf.staged_score(X_train, y_train)), label="Traing score") ax.plot(list(X), list(clf.staged_score(X_test, y_test)), label="Testing score") ax.set_xlabel("estimator num") ax.set_ylabel("score") ax.legend(loc="lower right") ax.set_ylim(0, 1) ax.set_title("AdaBoostClassifier with Decision Tree") ####### Gaussian Naive Bayes 個體分類器 ######## ax = fig.add_subplot(2, 1, 2) clf = AdaBoostClassifier(learning_rate=0.1, base_estimator=GaussianNB()) clf.fit(X_train, y_train) ## 繪圖 estimators_num = len(clf.estimators_) X = range(1, estimators_num + 1) ax.plot(list(X), list(clf.staged_score(X_train, y_train)), label="Traing score") ax.plot(list(X), list(clf.staged_score(X_test, y_test)), label="Testing score") ax.set_xlabel("estimator num") ax.set_ylabel("score") ax.legend(loc="lower right") ax.set_ylim(0, 1) ax.set_title("AdaBoostClassifier with Gaussian Naive Bayes") plt.show()

Discrete versus Real AdaBoost

* 下面的例子對AdaBoostClassifier的兩種算法及其提升效果進行更詳細的描述 *

from sklearn.metrics import zero_one_lossn_estimators = 400 # A learning rate of 1. may not be optimal for both SAMME and SAMME.R learning_rate = 1.X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)X_test, y_test = X[2000:], y[2000:] X_train, y_train = X[:2000], y[:2000]dt_stump = DecisionTreeClassifier(max_depth=1, min_samples_leaf=1) dt_stump.fit(X_train, y_train) dt_stump_err = 1.0 - dt_stump.score(X_test, y_test)dt = DecisionTreeClassifier(max_depth=9, min_samples_leaf=1) dt.fit(X_train, y_train) dt_err = 1.0 - dt.score(X_test, y_test)ada_discrete = AdaBoostClassifier(base_estimator=dt_stump,learning_rate=learning_rate,n_estimators=n_estimators,algorithm="SAMME") ada_discrete.fit(X_train, y_train)ada_real = AdaBoostClassifier(base_estimator=dt_stump,learning_rate=learning_rate,n_estimators=n_estimators,algorithm="SAMME.R") ada_real.fit(X_train, y_train)fig = plt.figure(figsize=(16,8)) ax = fig.add_subplot(111)ax.plot([1, n_estimators], [dt_stump_err] * 2, 'k-',label='Decision Stump Error') ax.plot([1, n_estimators], [dt_err] * 2, 'k--',label='Decision Tree Error')ada_discrete_err = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_discrete.staged_predict(X_test)):ada_discrete_err[i] = zero_one_loss(y_pred, y_test)ada_discrete_err_train = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_discrete.staged_predict(X_train)):ada_discrete_err_train[i] = zero_one_loss(y_pred, y_train)ada_real_err = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_real.staged_predict(X_test)):ada_real_err[i] = zero_one_loss(y_pred, y_test)ada_real_err_train = np.zeros((n_estimators,)) for i, y_pred in enumerate(ada_real.staged_predict(X_train)):ada_real_err_train[i] = zero_one_loss(y_pred, y_train)ax.plot(np.arange(n_estimators) + 1, ada_discrete_err,label='Discrete AdaBoost Test Error',color='red') ax.plot(np.arange(n_estimators) + 1, ada_discrete_err_train,label='Discrete AdaBoost Train Error',color='blue') ax.plot(np.arange(n_estimators) + 1, ada_real_err,label='Real AdaBoost Test Error',color='orange') ax.plot(np.arange(n_estimators) + 1, ada_real_err_train,label='Real AdaBoost Train Error',color='green')ax.set_ylim((0.0, 0.5)) ax.set_xlabel('n_estimators') ax.set_ylabel('error rate')leg = ax.legend(loc='upper right', fancybox=True) leg.get_frame().set_alpha(0.7)plt.show()

參考資料

sklearn官方文檔:AdaBoostClassifier
sklearn官方文檔:Multi-class AdaBoosted Decision Trees
sklearn官方文檔:Discrete versus Real AdaBoost
sklearn官方文檔:ensemble(集成算法)
《python大戰機器學習數據科學家的一個小目標》華校專，王正林編著

總結

以上是生活随笔為你收集整理的AdaBoost算法特性的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： sklearn中的回归决策树
下一篇： Hyperopt官方中文文档导读