當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

iris数据集——决策树

發布時間：2023/12/20 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 iris数据集——决策树小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

此處主要學習決策樹的分類問題——DecisionTreeClassifier

1、決策樹算法的環境搭建

GraphViz是將決策樹模型可視化的一個模塊。Anaconda不自帶該模塊，因此想要可視化決策樹則需要安裝Graphviz，執行以下步驟：

（1）可通過網址https://graphviz.io/_pages/Download/Download_windows.html下載安裝Graphviz。如果計算機系統是Linux，可以用apt-get或者yum方法安裝。若是Windows系統，在官網下載GraphViz-2.38.msi文件并安裝。無論是Linux還是Windows，裝完后都要設置環境變量，將GraphViz的bin目錄加入PATH。如果是Windows系統，將C:/Program Files(x86)/Graphviz2.38/bin/加入PATH。

（2）安裝Python插件GraphViz，在Anaconda Prompt彈出的窗口中運行下面的命令：

pip install graphviz

（3）安裝Python插件pydotplus.

conda install -c conda-forge pydotplus pip install pydotplus

這樣環境就搭好了，有時候仍然找不到graphviz，這時可以在代碼里面加入這一行：

import os os.environ["PATH"]+=os.pathsep+'C:/Program Files(x86)/Graphviz2.38/bin/'

2、使用決策樹對鳶尾花數據集iris分析

主要是數據集構造決策樹，先生成DecisionTreeClassifier類的一個實例（如clf_tree）；然后使用該實例調用fit()方法進行訓練；對于訓練好的決策樹模型，可以使用predict()方法對新的樣本進行預測，其predict()將返回新樣本值的預測類別；sklearn.tree模塊提供了訓練決策樹模型的文本描述輸出方法export_graphviz(),該方法可查看訓練的決策樹模型參數。

（1）先導入iris數據集，觀察數據集的基本信息

from sklearn import datasets iris=datasets.load_iris() print('iris.data的形狀為',iris.data.shape) print('iris.data的特征名稱為：',iris.feature_names) print('iris.target的內容為：\n',iris.target) print('iris.target的形狀為',iris.target.shape) print('iris.target的鳶尾花名稱為',iris.target_names)

（2）導入tree模塊，生成DecisionTreeClassifier()類的實例，訓練模型，并輸出模型數據文件

x=iris.data # 數據特征 y=iris.target # 數據特征 from sklearn import tree # 導入scikit-learn的tree模塊 # 評價標準為 criterion='entropy',決策樹最大深度為 max_depth=2 clf_tree=tree.DecisionTreeClassifier(criterion='entropy',max_depth=2) clf_tree.fit(x,y) dot_data=tree.export_graphviz(clf_tree,out_file=None,feature_names=iris.feature_names,class_names=True,filled=True,rounded=True) print('dot_data決策結果數據文件為:\n',dot_data)

（3）為了能直觀地觀察訓練好的決策樹，則利用pydotplus+GraphViz將決策樹可視化,有2種方法，下面我們一一例舉

# 決策樹可視化方法一：（可直接把圖產生在python的notebook） from IPython.display import Image import pydotplus graph = pydotplus.graph_from_dot_data(dot_data) Image(graph.create_png())

# 決策樹可視化方法二（用pydotplus生成iris.pdf） import pydotplus graph = pydotplus.graph_from_dot_data(dot_data) graph.write_pdf("iris.pdf")

可數化結果保存在iris.pdf文件中

?決策樹可視化的方法中，個人比較推薦第一種的做法，因為這樣可以直接把圖產生在ipython的notebook，直接觀察其結果。

（4）使用訓練好的決策樹模型clf_tree對數據集進行預測，將預測結果與真實類標簽進行可視化對比，觀察其預測結果。

# 預測結果部分 y_predict=clf_tree.predict(x) # 可視化部分 import matplotlib.pyplot as plt plt.rcParams['font.sans-serif']='SimHei' # 設置字體為SimHei以顯示中文 plt.rcParams['axes.unicode_minus']=False # 坐標軸刻度顯示負號 plt.rc('font',size=(14)) plt.scatter(range(len(y)),y,marker='o') plt.scatter(range(len(y)),y_predict+0.1,marker='*') plt.legend(['真實類別','預測類別']) plt.title('使用決策樹對iris數據集的預測結果與真實類別進行對比') plt.show()

從上圖中，我們可以看出有6個樣本的預測結果是錯誤的。

（5）改變評價標準然后觀察其預測結果

# 調用決策樹，將評價標準改為:gini clf_tree2=tree.DecisionTreeClassifier(criterion='gini',max_depth=2) clf_tree2.fit(x,y) dot_data=tree.export_graphviz(clf_tree2,out_file=None,feature_names=iris.feature_names,class_names=True,filled=True,rounded=True) graph = pydotplus.graph_from_dot_data(dot_data) Image(graph.create_png())

# 預測值部分 y_predict2=clf_tree2.predict(x) # 可視化部分 plt.figure(figsize=(10,4)) plt.scatter(range(len(y)),y,marker='o') plt.scatter(range(len(y)),y_predict2+0.1,marker='*') plt.legend(['真實類別','預測類別']) plt.title('使用決策樹對iris數據集的預測結果與真實類別進行對比') plt.show()

我們從圖中可以看出，還是有6個樣本的預測結果是錯誤的，這說明不能通過改變評價標準的方式提高預測結果的準確率。

（6）改變深度然后觀察其預測結果

# 調用決策樹，改變最大深度 clf_tree3=tree.DecisionTreeClassifier(criterion='entropy',max_depth=3) clf_tree3.fit(x,y) dot_data=tree.export_graphviz(clf_tree3,out_file=None,feature_names=iris.feature_names,class_names=True,filled=True,rounded=True) graph = pydotplus.graph_from_dot_data(dot_data) Image(graph.create_png())

# 預測結果部分 y_predict3=clf_tree3.predict(x) # 可視化部分 plt.figure(figsize=(10,4)) plt.scatter(range(len(y)),y,marker='o') plt.scatter(range(len(y)),y_predict3+0.1,marker='*') plt.legend(['真實類別','預測類別']) plt.title('使用決策樹對iris數據集的預測結果與真實類別進行對比') plt.show()

?從上圖我們可以看到，預測結果的4個樣本是錯誤的，這說明我們可以通過改變深度來提高分類的準確率。

總結

以上是生活随笔為你收集整理的iris数据集——决策树的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： FreeRTOS及其应用，万字长文，基础
下一篇：现在的年轻人，正在努力的改变着一些东西