RandomForest:随机森林
隨機(jī)森林:RF
隨機(jī)森林是一種一決策樹為基學(xué)習(xí)器的Bagging算法,但是不同之處在于RF決策樹的訓(xùn)練過程中還加入了隨機(jī)屬性選擇(特征上的子采樣)
- 傳統(tǒng)的決策樹在選擇劃分的屬性時(shí),會(huì)選擇最優(yōu)屬性
- RF
參數(shù)
使用這些方法時(shí)要調(diào)整的參數(shù)主要是 n_estimators 和 max_features。 前者(n_estimators)是森林里樹的數(shù)量,通常數(shù)量越大,效果越好,但是計(jì)算時(shí)間也會(huì)隨之增加。 此外要注意,當(dāng)樹的數(shù)量超過一個(gè)臨界值之后,算法的效果并不會(huì)很顯著地變好。 后者(max_features)是分割節(jié)點(diǎn)時(shí)考慮的特征的隨機(jī)子集的大小。 這個(gè)值越低,方差減小得越多,但是偏差的增大也越多。 根據(jù)經(jīng)驗(yàn),回歸問題中使用 max_features = n_features, 分類問題使用max_features = sqrt(n_features (其中 n_features 是特征的個(gè)數(shù))是比較好的默認(rèn)值。max_depth = None和 min_samples_split = 2 結(jié)合通常會(huì)有不錯(cuò)的效果(即生成完全的樹)。 請(qǐng)記住,這些(默認(rèn))值通常不是最佳的,同時(shí)還可能消耗大量的內(nèi)存,最佳參數(shù)值應(yīng)由交叉驗(yàn)證獲得。 另外,請(qǐng)注意,在隨機(jī)森林中,默認(rèn)使用自助采樣法(bootstrap = True), 然而 extra-trees 的默認(rèn)策略是使用整個(gè)數(shù)據(jù)集(bootstrap = False)。 當(dāng)使用自助采樣法方法抽樣時(shí),泛化精度是可以通過剩余的或者袋外的樣本來估算的,設(shè)置 oob_score = True 即可實(shí)現(xiàn)。
提示:
默認(rèn)參數(shù)下模型復(fù)雜度是:O(M*N*log(N)) , 其中 M 是樹的數(shù)目, N 是樣本數(shù)。 可以通過設(shè)置以下參數(shù)來降低模型復(fù)雜度: min_samples_split ,?min_samples_leaf ,?max_leaf_nodes?和?max_depth 。
偏差與方差問題
理論部分
實(shí)驗(yàn)部分
此處我們對(duì)sklearn中的Single estimator versus bagging: bias-variance decomposition示例進(jìn)行稍微的修改,用來展示極限隨機(jī)森林與隨機(jī)森林,普通決策樹在同一數(shù)據(jù)集上的方差-偏差分解
import numpy as np import matplotlib.pyplot as pltplt.figure(figsize=(20, 10))from sklearn.ensemble import ExtraTreesRegressor, RandomForestRegressor from sklearn.tree import DecisionTreeRegressor# Settings n_repeat = 50 # Number of iterations for computing expectations n_train = 50 # Size of the training set n_test = 1000 # Size of the test set noise = 0.1 # Standard deviation of the noise np.random.seed(0)estimators = [("Tree", DecisionTreeRegressor()),("RandomForestRegressor", RandomForestRegressor(random_state=100,bootstrap = True)),("ExtraTreesClassifier", ExtraTreesRegressor(random_state=100,bootstrap = True)), ]n_estimators = len(estimators)# Generate data def f(x):x = x.ravel()return np.exp(-x ** 2) + 1.5 * np.exp(-(x - 2) ** 2)def generate(n_samples, noise, n_repeat=1):X = np.random.rand(n_samples) * 10 - 5X = np.sort(X)if n_repeat == 1:y = f(X) + np.random.normal(0.0, noise, n_samples)else:y = np.zeros((n_samples, n_repeat))for i in range(n_repeat):y[:, i] = f(X) + np.random.normal(0.0, noise, n_samples)X = X.reshape((n_samples, 1))return X, yX_train = [] y_train = []for i in range(n_repeat):X, y = generate(n_samples=n_train, noise=noise)X_train.append(X)y_train.append(y)X_test, y_test = generate(n_samples=n_test, noise=noise, n_repeat=n_repeat)# Loop over estimators to compare for n, (name, estimator) in enumerate(estimators):# Compute predictionsy_predict = np.zeros((n_test, n_repeat))for i in range(n_repeat):estimator.fit(X_train[i], y_train[i])y_predict[:, i] = estimator.predict(X_test)# Bias^2 + Variance + Noise decomposition of the mean squared errory_error = np.zeros(n_test)for i in range(n_repeat):for j in range(n_repeat):y_error += (y_test[:, j] - y_predict[:, i]) ** 2y_error /= (n_repeat * n_repeat)y_noise = np.var(y_test, axis=1)y_bias = (f(X_test) - np.mean(y_predict, axis=1)) ** 2y_var = np.var(y_predict, axis=1)print("{0}: {1:.4f} (error) = {2:.4f} (bias^2) "" + {3:.4f} (var) + {4:.4f} (noise)".format(name,np.mean(y_error),np.mean(y_bias),np.mean(y_var),np.mean(y_noise)))# Plot figuresplt.subplot(2, n_estimators, n + 1)plt.plot(X_test, f(X_test), "b", label="$f(x)$")plt.plot(X_train[0], y_train[0], ".b", label="LS ~ $y = f(x)+noise$")for i in range(n_repeat):if i == 0:plt.plot(X_test, y_predict[:, i], "r", label="$\^y(x)$")else:plt.plot(X_test, y_predict[:, i], "r", alpha=0.05)plt.plot(X_test, np.mean(y_predict, axis=1), "c",label="$\mathbb{E}_{LS} \^y(x)$")plt.xlim([-5, 5])plt.title(name)if n == 0:plt.legend(loc="upper left", prop={"size": 11})plt.subplot(2, n_estimators, n_estimators + n + 1)plt.plot(X_test, y_error, "r", label="$error(x)$")plt.plot(X_test, y_bias, "b", label="$bias^2(x)$"),plt.plot(X_test, y_var, "g", label="$variance(x)$"),plt.plot(X_test, y_noise, "c", label="$noise(x)$")plt.xlim([-5, 5])plt.ylim([0, 0.1])if n == 0:plt.legend(loc="upper left", prop={"size": 11})plt.show() Tree: 0.0255 (error) = 0.0003 (bias^2) + 0.0152 (var) + 0.0098 (noise) RandomForestRegressor: 0.0202 (error) = 0.0004 (bias^2) + 0.0098 (var) + 0.0098 (noise) ExtraTreesClassifier: 0.0175 (error) = 0.0011 (bias^2) + 0.0065 (var) + 0.0098 (noise)由實(shí)驗(yàn)結(jié)果我們可以很好地看出,相對(duì)于一般的決策樹,隨機(jī)森林雖然增加了模型的偏差,但是大幅度降低了方差,因而在整體上獲取了更好的結(jié)果;而相比之下,在剛剛實(shí)驗(yàn)中的RF算法方差仍然遠(yuǎn)遠(yuǎn)大于偏差,這個(gè)時(shí)候我們就可以采用極限隨機(jī)森林,正因?yàn)橐话愣詷O限隨機(jī)森林相對(duì)于隨機(jī)森林進(jìn)一步增加了偏差,同時(shí)進(jìn)一步下降了方差,因?yàn)樵谠搶?shí)驗(yàn)中極限隨機(jī)森林應(yīng)當(dāng)獲取要優(yōu)于隨機(jī)森林的效果(不過這種趨勢(shì)并不一定是百分之百的)
特征重要程度評(píng)估
特征對(duì)于目標(biāo)變量的相對(duì)重要程度,可以根據(jù)特征使用的相對(duì)順序進(jìn)行評(píng)估。決策樹頂部使用的特征對(duì)更大一部分輸入樣本的最終預(yù)測(cè)結(jié)果做出貢獻(xiàn);因此,可以使用接受每個(gè)特征對(duì)最終預(yù)測(cè)的貢獻(xiàn)的樣本比例來評(píng)估該 特征的相對(duì)重要性 。
在RF中,通過歲多個(gè)隨機(jī)數(shù)中的預(yù)測(cè)貢獻(xiàn)率進(jìn)行平均,降低了方差,因此可用于特征選擇。不過要注意的是隨機(jī)森林與極限隨機(jī)森林對(duì)于同一個(gè)數(shù)據(jù)集根除的重要程度不一定相同,而且即使是一個(gè)模型在參數(shù)不同的情況下,最終結(jié)果也并不一定相同
因?yàn)闃O限隨機(jī)森林的特殊性質(zhì),所以請(qǐng)不要采用極限隨機(jī)森林進(jìn)行特征重要程度的排名,建議使用RF.
import numpy as np from sklearn.datasets import make_classification from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier import matplotlib.pyplot as pltplt.figure(figsize=(8, 4))estimators = [("RandomForest", RandomForestClassifier(random_state=100)),("ExtraTrees", ExtraTreesClassifier(random_state=100)), ]n_estimators = len(estimators)# Build a classification task using 3 informative features X, y = make_classification(n_samples=1000,n_features=10,n_informative=3,n_redundant=0,n_repeated=0,n_classes=2,random_state=0,shuffle=False)# Build a forest and compute the feature importances forest = ExtraTreesClassifier(n_estimators=250,random_state=0)forest.fit(X, y)for n, (name, estimator) in enumerate(estimators):estimator.fit(X, y)importances = estimator.feature_importances_std = np.std([tree.feature_importances_ for tree in forest.estimators_],axis=0)indices = np.argsort(importances)[::-1]# Print the feature ranking # print(name +" Feature ranking:") # for f in range(X.shape[1]): # print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))# Plot the feature importances of the forestplt.subplot(1, n_estimators, n + 1)plt.title(name + " Feature importances")plt.bar(range(X.shape[1]), importances[indices],color="r", yerr=std[indices], align="center")plt.xticks(range(X.shape[1]), indices)plt.xlim([-1, X.shape[1]]) plt.show()完全隨機(jī)樹嵌入
sklearn中還實(shí)現(xiàn)了隨機(jī)森林的一種特殊用法,即完全隨機(jī)樹嵌入(RandomTreesEmbedding)。RandomTreesEmbedding 實(shí)現(xiàn)了一個(gè)無監(jiān)督的數(shù)據(jù)轉(zhuǎn)換。 通過由完全隨機(jī)樹構(gòu)成的森林,RandomTreesEmbedding 使用數(shù)據(jù)最終歸屬的葉子節(jié)點(diǎn)的索引值(編號(hào))對(duì)數(shù)據(jù)進(jìn)行編碼。 該索引以 one-of-K 方式編碼,最終形成一個(gè)高維的稀疏二進(jìn)制編碼。 這種編碼可以被非常高效地計(jì)算出來,并且可以作為其他學(xué)習(xí)任務(wù)的基礎(chǔ)。 編碼的大小和稀疏度可以通過選擇樹的數(shù)量和每棵樹的最大深度來確定。對(duì)于集成中的每棵樹的每個(gè)節(jié)點(diǎn)包含一個(gè)實(shí)例(校對(duì)者注:這里真的沒搞懂)。 編碼的大小(維度)最多為 n_estimators * 2 ** max_depth,即森林中的葉子節(jié)點(diǎn)的最大數(shù)。
其作用一共有兩種:
對(duì)于功能一
下面是一個(gè)驗(yàn)證完全隨機(jī)樹嵌入作用的兩個(gè)例子:
例子一:使用完全隨機(jī)樹嵌入進(jìn)行散列特征轉(zhuǎn)換
RandomTreesEmbedding提供了一種將數(shù)據(jù)映射到非常高維稀疏表示的方法,這可能有助于分類。該映射是完全無監(jiān)督的,非常有效。
這個(gè)例子顯示了由幾棵樹給出的分區(qū),并且顯示了變換如何也可以用于非線性降維或者非線性分類。
相鄰的點(diǎn)經(jīng)常共享同一個(gè)樹的葉節(jié)點(diǎn)并且因此共享大部分的散列表示,這允許截?cái)嗥娈愔捣纸?#xff08;truncated SVD)可以分離數(shù)據(jù)轉(zhuǎn)換后的兩個(gè)同心圓。
在高維空間中,線性分類器通常達(dá)到極好的精度。對(duì)于稀疏的二進(jìn)制數(shù)據(jù),BernoulliNB特別適合。最下面一行將BernoulliNB在變換空間中獲得的決策邊界與在原始數(shù)據(jù)上學(xué)習(xí)的ExtraTreesClassifier森林進(jìn)行比較。
from sklearn.datasets import make_circles from sklearn.ensemble import RandomTreesEmbedding, ExtraTreesClassifier from sklearn.decomposition import TruncatedSVD from sklearn.naive_bayes import BernoulliNB# make a synthetic dataset X, y = make_circles(factor=0.5, random_state=0, noise=0.05)# use RandomTreesEmbedding to transform data hasher = RandomTreesEmbedding(n_estimators=10, random_state=0, max_depth=3) X_transformed = hasher.fit_transform(X)# Visualize result after dimensionality reduction using truncated SVD svd = TruncatedSVD(n_components=2) X_reduced = svd.fit_transform(X_transformed)# Learn a Naive Bayes classifier on the transformed data nb = BernoulliNB() nb.fit(X_transformed, y)# Learn an ExtraTreesClassifier for comparison trees = ExtraTreesClassifier(max_depth=3, n_estimators=10, random_state=0) trees.fit(X, y)# scatter plot of original and reduced data fig = plt.figure(figsize=(9, 8))ax = plt.subplot(221) ax.scatter(X[:, 0], X[:, 1], c=y, s=50, edgecolor='k') ax.set_title("Original Data (2d)") ax.set_xticks(()) ax.set_yticks(())ax = plt.subplot(222) ax.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, s=50, edgecolor='k') ax.set_title("Truncated SVD reduction (2d) of transformed data (%dd)" %X_transformed.shape[1]) ax.set_xticks(()) ax.set_yticks(())# Plot the decision in original space. For that, we will assign a color # to each point in the mesh [x_min, x_max]x[y_min, y_max]. h = .01 x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5 y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))# transform grid using RandomTreesEmbedding transformed_grid = hasher.transform(np.c_[xx.ravel(), yy.ravel()]) y_grid_pred = nb.predict_proba(transformed_grid)[:, 1]ax = plt.subplot(223) ax.set_title("Naive Bayes on Transformed data") ax.pcolormesh(xx, yy, y_grid_pred.reshape(xx.shape)) ax.scatter(X[:, 0], X[:, 1], c=y, s=50, edgecolor='k') ax.set_ylim(-1.4, 1.4) ax.set_xlim(-1.4, 1.4) ax.set_xticks(()) ax.set_yticks(())# transform grid using ExtraTreesClassifier y_grid_pred = trees.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]ax = plt.subplot(224) ax.set_title("ExtraTrees predictions") ax.pcolormesh(xx, yy, y_grid_pred.reshape(xx.shape)) ax.scatter(X[:, 0], X[:, 1], c=y, s=50, edgecolor='k') ax.set_ylim(-1.4, 1.4) ax.set_xlim(-1.4, 1.4) ax.set_xticks(()) ax.set_yticks(())plt.tight_layout() plt.show()[關(guān)于生成新特征這一作用的實(shí)驗(yàn)](Feature transformations with ensembles of trees)
完全隨機(jī)樹嵌入,可以將特征轉(zhuǎn)化為更高維度,更稀疏的空間,方式為首先在數(shù)據(jù)集上訓(xùn)練模型(極限隨機(jī)森林,隨機(jī)森林,GBT系列皆可)然后將新的特征空間中每個(gè)葉節(jié)點(diǎn)都會(huì)分配一個(gè)固定的特征索引,然后將所有的葉節(jié)點(diǎn)進(jìn)行獨(dú)熱編碼,通過將樣本所在的葉子設(shè)置為1,其他特征設(shè)置為0,來對(duì)樣本進(jìn)行編碼,將其轉(zhuǎn)轉(zhuǎn)換到稀疏的,高維度的空間.
下面的代碼展示了,不同轉(zhuǎn)換模型轉(zhuǎn)換出的特征最終得到特征在LR上的分類效果,第二幅圖是第一幅圖左上角的放大,可以看出在本數(shù)據(jù)集上似乎還是GBT系列的轉(zhuǎn)換效果好一些(你也可以使用lightGBM與XGBoot中的sklearn借口,實(shí)現(xiàn)代碼中GradientBoostingClassifier類似的效果)
import numpy as np np.random.seed(10)import matplotlib.pyplot as pltfrom sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier,GradientBoostingClassifier) from sklearn.preprocessing import OneHotEncoder from sklearn.model_selection import train_test_split from sklearn.metrics import roc_curve from sklearn.pipeline import make_pipelinen_estimator = 10 X, y = make_classification(n_samples=80000) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5) # It is important to train the ensemble of trees on a different subset # of the training data than the linear regression model to avoid # overfitting, in particular if the total number of leaves is # similar to the number of training samples X_train, X_train_lr, y_train, y_train_lr = train_test_split(X_train,y_train,test_size=0.5)# Unsupervised transformation based on totally random trees rt = RandomTreesEmbedding(max_depth=3, n_estimators=n_estimator,random_state=0)rt_lm = LogisticRegression() pipeline = make_pipeline(rt, rt_lm) pipeline.fit(X_train, y_train) y_pred_rt = pipeline.predict_proba(X_test)[:, 1] fpr_rt_lm, tpr_rt_lm, _ = roc_curve(y_test, y_pred_rt)# Supervised transformation based on random forests rf = RandomForestClassifier(max_depth=3, n_estimators=n_estimator) rf_enc = OneHotEncoder() rf_lm = LogisticRegression() rf.fit(X_train, y_train) rf_enc.fit(rf.apply(X_train)) rf_lm.fit(rf_enc.transform(rf.apply(X_train_lr)), y_train_lr)y_pred_rf_lm = rf_lm.predict_proba(rf_enc.transform(rf.apply(X_test)))[:, 1] fpr_rf_lm, tpr_rf_lm, _ = roc_curve(y_test, y_pred_rf_lm)grd = GradientBoostingClassifier(n_estimators=n_estimator) grd_enc = OneHotEncoder() grd_lm = LogisticRegression() grd.fit(X_train, y_train) grd_enc.fit(grd.apply(X_train)[:, :, 0]) grd_lm.fit(grd_enc.transform(grd.apply(X_train_lr)[:, :, 0]), y_train_lr)y_pred_grd_lm = grd_lm.predict_proba(grd_enc.transform(grd.apply(X_test)[:, :, 0]))[:, 1] fpr_grd_lm, tpr_grd_lm, _ = roc_curve(y_test, y_pred_grd_lm)# The gradient boosted model by itself y_pred_grd = grd.predict_proba(X_test)[:, 1] fpr_grd, tpr_grd, _ = roc_curve(y_test, y_pred_grd)# The random forest model by itself y_pred_rf = rf.predict_proba(X_test)[:, 1] fpr_rf, tpr_rf, _ = roc_curve(y_test, y_pred_rf)plt.figure(1) plt.plot([0, 1], [0, 1], 'k--') plt.plot(fpr_rt_lm, tpr_rt_lm, label='RT + LR') plt.plot(fpr_rf, tpr_rf, label='RF') plt.plot(fpr_rf_lm, tpr_rf_lm, label='RF + LR') plt.plot(fpr_grd, tpr_grd, label='GBT') plt.plot(fpr_grd_lm, tpr_grd_lm, label='GBT + LR') plt.xlabel('False positive rate') plt.ylabel('True positive rate') plt.title('ROC curve') plt.legend(loc='best') plt.show()plt.figure(2) plt.xlim(0, 0.2) plt.ylim(0.8, 1) plt.plot([0, 1], [0, 1], 'k--') plt.plot(fpr_rt_lm, tpr_rt_lm, label='RT + LR') plt.plot(fpr_rf, tpr_rf, label='RF') plt.plot(fpr_rf_lm, tpr_rf_lm, label='RF + LR') plt.plot(fpr_grd, tpr_grd, label='GBT') plt.plot(fpr_grd_lm, tpr_grd_lm, label='GBT + LR') plt.xlabel('False positive rate') plt.ylabel('True positive rate') plt.title('ROC curve (zoomed in at top left)') plt.legend(loc='best') plt.show()
除此之外,隨機(jī)森林還可以進(jìn)行異常檢測(cè),sklearn中也實(shí)現(xiàn)了該算法IsolationForest.更多內(nèi)容請(qǐng)閱讀我的博客。
參考
- sklearn官方文檔:ensemble
- sklearn官方文檔:使用完全隨機(jī)數(shù)進(jìn)行散列特征轉(zhuǎn)換
- sklearn官方文檔:Hashing feature transformation using Totally Random Trees
- sklearn ApacheCN中文官方文檔:集成算法
總結(jié)
以上是生活随笔為你收集整理的RandomForest:随机森林的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 通过模型进行特征选择
- 下一篇: Gradient Tree Boosti