日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

CVP(Critical Value Pruning)illustration with clear principle in details

發(fā)布時(shí)間:2023/12/20 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 CVP(Critical Value Pruning)illustration with clear principle in details 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Note:
CVP(Critical Value Pruning) is also called
Chi-Square Pruning(test) in many materials.

The following is a contingency table[1]:

H0:Xijn=NiNjn2H_0:\frac{X_{ij}}{n}=\frac{N_iN_j}{n^2}H0?:nXij??=n2Ni?Nj??
H1:Xijn≠NiNjn2H_1:\frac{X_{ij}}{n}≠\frac{N_iN_j}{n^2}H1?:nXij???=n2Ni?Nj??
Nij=XijN_{ij}=X_{ij}Nij?=Xij?
when
∑i=1i=r∑j=1j=s(Nij?Ni?Njn)2Ni?Njn<χ[(r?1)(s?1)],α2=criticalvalue\sum_{i=1}^{i=r}\sum_{j=1}^{j=s}\frac{(N_{ij}-\frac{N_i·N_j}{n})^2} {\frac{N_i·N_j}{n}}<\chi_{[(r-1)(s-1)],\alpha}^2=critical \ valuei=1i=r?j=1j=s?nNi??Nj??(Nij??nNi??Nj??)2?<χ[(r?1)(s?1)],α2?=critical?value
then H0H_0H0? is accepted and the decision tree is pruned.
among which α\alphaα can be set as 0.05,etc.
nnn is the total length(means counts,quantity) of your datasets.
The relationships between contingency table and Decision Tree are listed in the following table:

Split node with Attribute fffclass 111class 222…\dotsclass sss
branch 111nL1n_{L1}nL1?nL2n_{L2}nL2?…\dotsnLn_LnL?
branch 222nR1n_{R1}nR1?nR2n_{R2}nR2?…\dotsnRn_RnR?
?\vdots??\vdots??\vdots??\ddots??\vdots?
branch rrrn1n_1n1?n2n_2n2?…\dotsnnn

Now let’s use the above table to learn the following lecture PPT[2].

in the above picture,some parameters are explained in the table in front of it.

Let’s go on…

In the above ppt,note that CVP can both be used in pre-pruning and post-pruning stages.
According to the growth stage ,we know “l(fā)ess than critical value” happend before current sub-tree is pruned.
So,we can infer from the above lecture PDF that the post-pruning will have the same experience.

How to understand the above pruning criterion?

------------------------------------------
In above table,
different branches
=different value level of current decision node of decision tree
(also called split node,one split node owns one Attribute of datasets)

when H0H_0H0? is accepted,then
XijNi?≈N?jn\frac{X_{ij}}{N_{i·}}≈\frac{N_{·j}}{n}Ni??Xij??nN?j??
which means:

the probability of “items belongs to class j” in each ith(i∈[1,r])i_{th}(i∈[1,r])ith?(i[1,r]) branch
=the probability of “items belongs to class j” in all datasets
=>Merging(prune) these branches into one leaf will Not make the probability of “items belongs to class j”vary too much,which means that accuracy will not vary too much after being pruned

In conclusion,when Chi-Square Statistics do Not reach the Critical Value,branches(different value levels of split Attribute of Decision Tree)
will not contribute too much for increasing the accuracy,then these branches can be pruned.

The above conclusion can be used directly when we implement our CVP(Critical Value Pruning) algorithm with python.

We can also learn from above analysis that CVP is targeted at simplifying your decision tree while Not losing accuracy too much.

Reference:
[1]http://www.maths.manchester.ac.uk/~saralees/pslect8.pdf
[2]https://www.docin.com/p1-2336928230.html

總結(jié)

以上是生活随笔為你收集整理的CVP(Critical Value Pruning)illustration with clear principle in details的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。