回归树与基于规则的模型(part3)--回归模型树
學習筆記,僅供參考,有錯必糾
回歸樹與基于規則的模型
回歸模型樹
One limitation of simple regression trees is that each terminal node(最終節點) uses the average of the training set outcomes(訓練結果變量的平均值) in that node for prediction. As a consequence, these models may not do a good job predicting samples whose true outcomes are extremely high or low.
One approach to dealing with this issue is to use a di?erent estimator(其他隊的估計量) in the terminal nodes.
Here we focus on the model tree approach described in Quinlan (1992) called M5, which is similar to regression trees except:
- 切分的準則不同;
- 最終節點利用線性模型來對結果變量進行預測(而不是使用簡單的平均);
- 新樣本的預測值通常是樹中同一條路徑下,若干不同模型預測值的組合。
Like simple regression trees, the initial split(初次切分) is found using an exhaustive search(窮舉搜索) over the predictors and training set samples, but, unlike those models, the expected reduction in the node’s error rate is used(此處的優化準則是節點上的期望誤差率減少量). Let SSS denote the entire set of data and let S1,S2,...,SPS_1,S_2,...,S_PS1?,S2?,...,SP? represent the P subsets of the data after splitting. The split criterion(分割準則) would be:
reduction=SD(S)?∑i=1Pnin?SD(Si)(1)reduction=SD(S)-\sum_{i=1}^{P} \frac{n_i}{n}*SD(S_i) \tag{1} reduction=SD(S)?i=1∑P?nni???SD(Si?)(1)
where SD is the standard deviation(標準差) and nin_ini? is the number of samples in partition iii(第iii個子集).
This metric determines if the total variation in the splits, weighted by sample size, is lower than in the presplit data.(這個指標衡量了進行切分后按樣本量進行加權的總變異是否比沒有切分時的總變異更小)
The split that is associated with the largest reduction in error is chosen (能使誤差達到最小的切分方案將被選中) and a linear model is created within the partitions using the split variable in the model(同時,在每一份子集中將利用切分變量擬合一個線性模型).
For subsequent splitting iterations, this process is repeated:
an initial split is determined and a linear model is created for the partition using the current split variable and all others that preceded it. (在子集中利用該切分變量和之前所有的切分變量擬合線性模型)
The error associated with each linear model is used in place of SD(S)SD(S)SD(S)in Eq.1 to determine the expected reduction in the error rate for the next split.(要計算下一個切分方案的期望誤差減小值,要在(1)式中用每一個線性模型的誤差取代SD(S)SD(S)SD(S))
The tree growing process continues along the branches of the tree until there are no further improvements in the error rate(誤差率不再有進一步的提升) or there are not enough samples to continue the process(沒有足夠的樣本去執行這個過程). Once the tree is fully grown, there is a linear model for every node in the tree(當樹完全生長后,樹的每個節點都具備一個線性模型).
Once the complete set of linear models have been created, each undergoes a simpli?cation procedure to potentially drop some of the terms(其中的每個模型都將經歷一個簡化的過程,即從模型中移除部分項). For a given model, an adjusted error rate is computed(可以計算其調整后的誤差率). First, the absolute di?erences between the observed and predicted data are calculated then multiplied by a term that penalizes models with large numbers of parameters(對變量數較多的模型進行懲罰):
AdjustedErrorRate=n?+pn??p∑i=1n?∣yi?y^i∣Adjusted \; Error \; Rate = \frac{n^*+p}{n^*-p}\sum_{i=1}^{n^*}|y_i-\hat{y}_i| AdjustedErrorRate=n??pn?+p?i=1∑n??∣yi??y^?i?∣
where n?n^*n? is the number of training set data points that were used to build the model and ppp is the number of parameters.
Each model term is dropped and the adjusted error rate(調整后的誤差率) is computed. 如果誤差率沒有比移除部分項時調整后誤差率要小,那么該項將從模型中被移除. In some cases, the linear model may be simpli?ed to having only an intercept. This procedure is independently applied to each linear model(不同線性模型的化簡過程是相互獨立的).
Model trees also incorporate a type of smoothing(平滑) to decrease the potential for over-?tting(減少潛在的過擬合).The technique is based on the recursive shrinking(遞歸收縮) methodology of Hastie and Pregibon (1990).
When predicting, the new sample goes downthe appropriate path of the tree(新樣本自上而下的落入到合適的路徑中) , and moving from the bottom up(自下而上), the linear models along that path are combined(路徑中的線性模型被組合在一起).
子節點和父節點的預測值將通過如下方式組合在一起,并獲得組合后的父節點預測值:
y^(p)′=n(k)y^(k)+cy^(p)n(k)+c\hat{y}_{(p)}'=\frac{n_{(k)} \hat{y}_{(k)}+c \hat{y}_{(p)}}{n_{(k)}+c} y^?(p)′?=n(k)?+cn(k)?y^?(k)?+cy^?(p)??
其中y^(k)\hat{y}_{(k)}y^?(k)?是子節點的預測值,n(k)n_{(k)}n(k)?是子節點訓練集的樣本量,y^(p)\hat{y}_{(p)}y^?(p)?是父節點的預測值,ccc是一個常數,默認值為15.
當這個組合后的預測值計算完畢后,y^(p)′\hat{y}_{(p)}'y^?(p)′?將作為新的子節點的預測值,類似的與其父節點的預測值進行組合,以此類推。
Smoothing the models has the e?ect of minimizing the collinearity issues(多重共線性問題). Removing the correlated predictors would produce a model that has less inconsistencies(減少不一致性) and is more interpretable(更加有解釋性). However, there is a measurable drop(明顯的負面影響) in performance by using the strategy.
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的回归树与基于规则的模型(part3)--回归模型树的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 腾达 N318 V6 无线路由器固定IP
- 下一篇: 回归树与基于规则的模型(part4)--