當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习算法-随机森林之决策树R 代码从头暴力实现（3）

發(fā)布時間：2025/3/15 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习算法-随机森林之决策树R 代码从头暴力实现（3）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

前文 (機(jī)器學(xué)習(xí)算法 - 隨機(jī)森林之決策樹初探（1）) 講述了決策樹的基本概念、決策評價標(biāo)準(zhǔn)并手算了單個變量、單個分組的Gini impurity。是一個基本概念學(xué)習(xí)的過程，如果不了解，建議先讀一下再繼續(xù)。

機(jī)器學(xué)習(xí)算法-隨機(jī)森林之決策樹R 代碼從頭暴力實現(xiàn)（2）通過 R 代碼從頭暴力方式自寫函數(shù)訓(xùn)練決策樹，已決策出第一個節(jié)點。后續(xù)......

再決策第二個節(jié)點、第三個節(jié)點

第一個決策節(jié)點找好了，后續(xù)再找其它決策節(jié)點。如果某個分支的點從屬于多個class，則遞歸決策。

遞歸決策終止的條件是：

再添加分支不會降低Gini impurity

某個分支的數(shù)據(jù)點屬于同一分類組 (Gini impurity = 0)

定義函數(shù)如下：

brute_descition_tree_result <- list() brute_descition_tree_result_index <- 0# 遞歸分支決策 brute_descition_tree <- function(data, measure_variable, class_variable, type="Root"){# 計算初始Gini值Init_gini_impurity <- Gini_impurity(data[[class_variable]])# 確定當(dāng)前需要決策的節(jié)點的最優(yōu)變量和最優(yōu)閾值brute_force_result <- Gini_impurity_for_all_possible_branches_of_all_variables(data, variables, class=class_variable, Init_gini_impurity=Init_gini_impurity)# 輸出中間計算結(jié)果print(brute_force_result)# 根據(jù)最優(yōu)決策變量、閾值和Gini增益split_variable <- brute_force_result[1,1]split_threshold <- brute_force_result[1,2]gini_gain = brute_force_result[1,5]# print(gini_gain)# 判斷此次決策是否需要保留if(gini_gain>0){brute_descition_tree_result_index <<- brute_descition_tree_result_index + 1brute_descition_tree_result[[brute_descition_tree_result_index]] <<- c(type=type, split_variable=split_variable,split_threshold=split_threshold)# print(brute_descition_tree_result_index)# print(brute_descition_tree_result)# 決策左右分支left <- data[data[split_variable]<split_threshold,]right <- data[data[split_variable]>=split_threshold,]# 分別對左右分支進(jìn)一步?jīng)Q策if(length(unique(left[[class_variable]]))>1){brute_descition_tree(data=left, measure_variable, class_variable,type=paste(brute_descition_tree_result_index, "left"))}if(length(unique(right[[class_variable]]))>1){brute_descition_tree(data=right, measure_variable, class_variable,type=paste(brute_descition_tree_result_index, "right"))}}# return(brute_descition_tree_result) }

調(diào)用函數(shù)，并輸出中間計算結(jié)果

brute_descition_tree(data, variables, "color")

根節(jié)點評估記錄

## Variable Threshold Left_branch Right_branch Gini_gain ## 5 x 1.95 blue x 3; red x 2 green x 5 0.38 ## 3 x 1.45 blue x 3 green x 5; red x 2 0.334285714285714 ## 31 y 1.75 blue x 3 green x 5; red x 2 0.334285714285714 ## 4 x 1.85 blue x 3; red x 1 green x 5; red x 1 0.303333333333333 ## 6 x 2.25 blue x 3; green x 1; red x 2 green x 4 0.253333333333333 ## 41 y 2.05 blue x 3; green x 1 green x 4; red x 2 0.203333333333333 ## 2 x 0.8 blue x 2 blue x 1; green x 5; red x 2 0.195 ## 21 y 1.25 blue x 2 blue x 1; green x 5; red x 2 0.195 ## 51 y 2.15 blue x 3; green x 1; red x 1 green x 4; red x 1 0.18 ## 7 x 2.75 blue x 3; green x 2; red x 2 green x 3 0.162857142857143 ## 71 y 2.9 blue x 3; green x 2; red x 2 green x 3 0.162857142857143 ## 61 y 2.5 blue x 3; green x 2; red x 1 green x 3; red x 1 0.103333333333333 ## 8 x 3.3 blue x 3; green x 3; red x 2 green x 2 0.095 ## 81 y 3.15 blue x 3; green x 3; red x 2 green x 2 0.095 ## 1 x 0.25 blue x 1 blue x 2; green x 5; red x 2 0.0866666666666667 ## 11 y 0.75 blue x 1 blue x 2; green x 5; red x 2 0.0866666666666667 ## 9 x 3.65 blue x 3; green x 4; red x 2 green x 1 0.0422222222222223 ## 91 y 3.4 blue x 3; green x 4; red x 2 green x 1 0.0422222222222223

第二層節(jié)點評估記錄

## Variable Threshold Left_branch Right_branch Gini_gain ## 3 x 1.45 blue x 3 red x 2 0.48 ## 31 y 1.8 blue x 3 red x 2 0.48 ## 2 x 0.8 blue x 2 blue x 1; red x 2 0.213333333333333 ## 21 y 1.25 blue x 2 blue x 1; red x 2 0.213333333333333 ## 4 x 1.85 blue x 3; red x 1 red x 1 0.18 ## 41 y 2.45 blue x 3; red x 1 red x 1 0.18 ## 1 x 0.25 blue x 1 blue x 2; red x 2 0.08 ## 11 y 0.75 blue x 1 blue x 2; red x 2 0.08

最終選擇的決策變量和決策閾值

as.data.frame(do.call(rbind, brute_descition_tree_result))

最終選擇的決策變量和決策閾值

## type split_variable split_threshold ## 1 Root x 1.95 ## 2 2 left x 1.45

運行后，獲得兩個決策節(jié)點，繪制決策樹如下：

從返回的Gini gain表格可以看出，第二個節(jié)點有兩種效果一樣的分支方式。

這樣我們就用暴力方式完成了決策樹的構(gòu)建。

https://victorzhou.com/blog/intro-to-random-forests/
https://victorzhou.com/blog/gini-impurity/
https://stats.stackexchange.com/questions/192310/is-random-forest-suitable-for-very-small-data-sets
https://towardsdatascience.com/understanding-random-forest-58381e0602d2
https://www.stat.berkeley.edu/~breiman/RandomForests/reg_philosophy.html
https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d

往期精品(點擊圖片直達(dá)文字對應(yīng)教程)

后臺回復(fù)“生信寶典福利第一波”或點擊閱讀原文獲取教程合集

（請備注姓名-學(xué)校/企業(yè)-職務(wù)等）

總結(jié)

以上是生活随笔為你收集整理的机器学习算法-随机森林之决策树R 代码从头暴力实现（3）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：学生信，不是贪多的，而是求精的！
下一篇：如何使用Bioconductor进行单细