當(dāng)前位置：首頁(yè) >

决策树分类实验

發(fā)布時(shí)間：2025/3/21 17 豆豆

生活随笔收集整理的這篇文章主要介紹了决策树分类实验小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

決策樹分類實(shí)驗(yàn)

文章目錄

決策樹分類實(shí)驗(yàn)
- 實(shí)驗(yàn)說明
- 實(shí)驗(yàn)步驟
- 參數(shù)優(yōu)化
- 可視化

實(shí)驗(yàn)說明

我們不需要自己去實(shí)現(xiàn)決策樹，sklearn包里已經(jīng)實(shí)現(xiàn)了決策樹分類模型，導(dǎo)入使用即可。

數(shù)據(jù)集我們使用的是 sklearn包中自帶的紅酒數(shù)據(jù)集。

實(shí)驗(yàn)環(huán)境：Anaconda3+VScode
Python版本：3.7
需要的第三方庫(kù)：sklearn、matplotlib、numpy、pydotplus

實(shí)驗(yàn)步驟

一個(gè)簡(jiǎn)單的決策樹分類實(shí)驗(yàn)一共分為六個(gè)步驟：

加載數(shù)據(jù)集

拆分?jǐn)?shù)據(jù)集

創(chuàng)建模型

在訓(xùn)練集學(xué)習(xí)得到模型

模型預(yù)測(cè)

模型評(píng)測(cè)

關(guān)于訓(xùn)練集和測(cè)試集的劃分我們使用的是留出法，最后的結(jié)果我們使用準(zhǔn)確率來進(jìn)行評(píng)估。

這一步用到的第三方庫(kù)是 sklearn。

代碼如下：

from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score#1.加載數(shù)據(jù)集 wdata = load_wine() # print(wdata)#2.拆分?jǐn)?shù)據(jù)集 x = wdata.data y = wdata.target x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=100);#3.創(chuàng)建模型 dtc = DecisionTreeClassifier(random_state=100, max_depth=5)#4.獲取在訓(xùn)練集的模型 dtc.fit(x_train, y_train)#5.預(yù)測(cè)結(jié)果 dtc_predict = dtc.predict(x_test)#6.模型評(píng)測(cè) result = accuracy_score(y_test, dtc_predict) print("準(zhǔn)確率：{}".format(result))

可以看到，得到的準(zhǔn)確率為：

參數(shù)優(yōu)化

我們可以通過修改隨機(jī)種子（random_state）和最大深度（max_depth）兩個(gè)參數(shù)來提升模型的準(zhǔn)確率。

上一步的結(jié)果中可以看到，當(dāng)隨機(jī)種子為100，最大深度為5時(shí)，準(zhǔn)確率為0.77。

我們將隨機(jī)種子改為10，最大深度改為2，準(zhǔn)確率提升為0.91。

同理，不斷地嘗試優(yōu)化參數(shù)來提高準(zhǔn)確率，當(dāng)隨機(jī)種子為20，最大深度為4時(shí)，準(zhǔn)確率提升為0.97，已經(jīng)相當(dāng)高了。

可視化

我們將剛才訓(xùn)練的決策樹選取兩個(gè)特征，借助散點(diǎn)圖進(jìn)行可視化。同時(shí)，可視化決策樹的生成過程。

這一步需要的第三方庫(kù)是 sklearn、matplotlib、numpy、pydotplus。

代碼如下：

from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap import numpy as np from sklearn.tree import export_graphviz import pydotplus#1.加載數(shù)據(jù)集 wdata = load_wine() # print(wdata)#2.拆分?jǐn)?shù)據(jù)集 x = wdata.data[:,:2] y = wdata.target x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=20);#3.創(chuàng)建模型 dtc = DecisionTreeClassifier(random_state=20, max_depth=4)#4.獲取在訓(xùn)練集的模型 dtc.fit(x_train, y_train)#5.預(yù)測(cè)結(jié)果 dtc_predict = dtc.predict(x_test)#6.模型評(píng)測(cè) result = accuracy_score(y_test, dtc_predict) # print("準(zhǔn)確率：{}".format(result))# 散點(diǎn)圖表示決策樹的分類效果 # 設(shè)置背景顏色 bk_color = ListedColormap(['#FFBBBB', '#BBFFBB', '#BBBBFF']) # 設(shè)置散點(diǎn)顏色 point_color = ListedColormap(['#FF0000', '#00FF00', '#0000FF']) # 設(shè)置坐標(biāo)軸 x1_min, x1_max = x_train[:,0].min()-1, x_train[:,0].max()+1 x2_min, x2_max = x_train[:,1].min()-1, x_train[:,1].max()+1 xx, yy = np.meshgrid(np.arange(x1_min, x1_max, 0.2), np.arange(x2_min, x2_max, 0.2)) z = dtc.predict(np.c_[xx.ravel(), yy.ravel()]) # print(xx.shape) # print(z.shape) z = z.reshape(xx.shape) # print(z)# 創(chuàng)建圖片 plt.figure() plt.pcolormesh(xx, yy, z, cmap=bk_color) plt.scatter(x[:,0], x[:,1], c=y, cmap=point_color, edgecolors='black') # 繪制刻度 plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) # 設(shè)置標(biāo)題 plt.title("DecisionTreeClassifier") # 展示圖表 plt.show()# 可視化決策樹生成過程 export_graphviz(dtc, out_file='dt_wine.dot', class_names=wdata.target_names, feature_names=wdata.feature_names[:2], filled=True) graph = pydotplus.graph_from_dot_file('dt_wine.dot') graph.write_png('dt_wine.png')

如果出現(xiàn)錯(cuò)誤信息“GraphViz’s executables not found”，請(qǐng)參照這篇博客進(jìn)行解決 GraphViz’s executables not found 解決方案

散點(diǎn)圖展示：

決策樹生成過程展示：

總結(jié)

以上是生活随笔為你收集整理的决策树分类实验的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

决策树

上一篇：第五章-分布式并行编程框架MapRedu
下一篇：随机森林回归实验