日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

基于sklearn的LogisticRegression鸢尾花多类分类实践

發布時間:2024/7/5 编程问答 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 基于sklearn的LogisticRegression鸢尾花多类分类实践 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

    • 1. 問題描述
    • 2. 數據介紹
      • 2.1 數據描述
      • 2.2 數據
      • 2.3 數據可視化
    • 3. 模型選擇
      • 3.1 固有的多類分類器
      • 3.2 1對多的多類分類器
      • 3.3 OneVsRestClassifier
      • 3.4 OneVsOneClassifier
    • 4. 結果分析
    • 5. 附完整代碼

鳶尾花(拼音:yuān wěi huā)又名:藍蝴蝶、紫蝴蝶、扁竹花等,鳶尾屬約300種,原產于中國中部及日本,是法國的國花。鳶尾花主要色彩為藍紫色,有“藍色妖姬”的美譽,鳶尾花因花瓣形如鳶鳥尾巴而稱之,有藍、紫、黃、白、紅等顏色,英文irises音譯俗稱為“愛麗絲”

本文使用sklearn的邏輯斯諦回歸模型,進行鳶尾花多分類預測,對OvR與OvO多分類方法下的預測結果進行對比。

1. 問題描述

  • 給定鳶尾花的特征數據集(花萼、花瓣的長和寬尺寸)
  • 預測其屬于哪個品種(Setosa,Versicolor,Virginica)

2. 數據介紹

from sklearn import datasets iris = datasets.load_iris() print(dir(iris)) # 查看data所具有的屬性或方法 # ['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']

我們看見數據有很多屬性或方法,我們依次來看一看:

2.1 數據描述

print(iris.DESCR) # 數據描述
  • 數據包含150個(每個類型的花50個)
  • 每個數據里有4個花的尺寸信息(花萼、花瓣的長寬)以及其分類class
  • 描述里給出了4種尺寸信息的(分布區間,均值,方差,分類相關系數)
  • 數據是否缺失某些值,作者,日期,來源,數據應用,參考文獻
.. _iris_dataset:Iris plants dataset --------------------**Data Set Characteristics:**:Number of Instances: 150 (50 in each of three classes):Number of Attributes: 4 numeric, predictive attributes and the class:Attribute Information:- sepal length in cm- sepal width in cm- petal length in cm- petal width in cm- class:- Iris-Setosa- Iris-Versicolour- Iris-Virginica:Summary Statistics:============== ==== ==== ======= ===== ====================Min Max Mean SD Class Correlation============== ==== ==== ======= ===== ====================sepal length: 4.3 7.9 5.84 0.83 0.7826sepal width: 2.0 4.4 3.05 0.43 -0.4194petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)============== ==== ==== ======= ===== ====================:Missing Attribute Values: None:Class Distribution: 33.3% for each of 3 classes.:Creator: R.A. Fisher:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov):Date: July, 1988The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken from Fisher's paper. Note that it's the same as in R, but not as in the UCI Machine Learning Repository, which has two wrong data points.This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other... topic:: References- Fisher, R.A. "The use of multiple measurements in taxonomic problems"Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions toMathematical Statistics" (John Wiley, NY, 1950).- Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.- Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New SystemStructure and Classification Rule for Recognition in Partially ExposedEnvironments". IEEE Transactions on Pattern Analysis and MachineIntelligence, Vol. PAMI-2, No. 1, 67-71.- Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactionson Information Theory, May 1972, 431-433.- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS IIconceptual clustering system finds 3 classes in the data.- Many, many more ...

2.2 數據

print(iris.data) # 特征數據 # 150行4列 <class 'numpy.ndarray'> print(iris.feature_names) # 特征名稱 # ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] print(iris.filename) # 文件路徑 C:\Users\***\AppData\Roaming\Python\Python37\site-packages\sklearn\datasets\data\iris.csv print(iris.target) # 分類標簽 size 150 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2] print(iris.target_names) # 分類名稱 3類 花的名稱 # ['setosa' 'versicolor' 'virginica']

2.3 數據可視化

由于平面只能展示2維特征,我們取2個特征進行進行觀看。

def show_data_set(X, y, data):plt.plot(X[y == 0, 0], X[y == 0, 1], 'rs', label=data.target_names[0])plt.plot(X[y == 1, 0], X[y == 1, 1], 'bx', label=data.target_names[1])plt.plot(X[y == 2, 0], X[y == 2, 1], 'go', label=data.target_names[2])plt.xlabel(data.feature_names[0])plt.ylabel(data.feature_names[1])plt.title("鳶尾花2維數據")plt.legend()plt.rcParams['font.sans-serif'] = 'SimHei' # 消除中文亂碼plt.show() iris = datasets.load_iris() # print(dir(iris)) # 查看data所具有的屬性或方法 # print(iris.data) # 數據 # print(iris.DESCR) # 數據描述 X = iris.data[:, :2] # 取前2列特征sepal(平面只能展示2維) # X = iris.data[:, 2:4] # petal兩個特征 # X = iris.data # 全部4個特征 y = iris.target # 分類 show_data_set(X, y, iris)


3. 模型選擇

本人相關文章:

  • 邏輯斯諦回歸模型( Logistic Regression,LR)
  • 基于sklearn的LogisticRegression二分類實踐

sklearn多類和多標簽算法:

  • Multiclass classification 多類分類 意味著一個分類任務需要對多于兩個類的數據進行分類。比如,對一系列的橘子,蘋果或者梨的圖片進行分類。多類分類假設每一個樣本有且僅有一個標簽:一個水果可以被歸類為蘋果,也可以是梨,但不能同時被歸類為兩類。

  • 固有的多類分類器:
    sklearn.linear_model.LogisticRegression (setting multi_class=”multinomial”)

  • 1對多的多類分類器:
    sklearn.linear_model.LogisticRegression (setting multi_class=”ovr”)

分類器Classifier方法:

  • One-vs-the-rest (OvR),也叫 one-vs-all,1對多, 在 OneVsRestClassifier 模塊中執行。 這個方法在于每一個類都將用一個分類器進行擬合。 對于每一個分類器,該類將會和其他所有的類有所區別。除了它的計算效率之外 (只需要 n_classes 個分類器), 這種方法的優點是它具有可解釋性。 因為每一個類都可以通過有且僅有一個分類器來代表,所以通過檢查一個類相關的分類器就可以獲得該類的信息。這是最常用的方法,也是一個合理的默認選擇。

  • One-vs-one (OvO),OneVsOneClassifier 1對1分類器 將會為每一對類別構造出一個分類器,在預測階段,收到最多投票的類別將會被挑選出來。 當存在結時(兩個類具有同樣的票數的時候), 1對1分類器會選擇總分類置信度最高的類,其中總分類置信度是由下層的二元分類器 計算出的成對置信等級累加而成。
    因為這需要訓練出 n_classes * (n_classes - 1) / 2 個分類器, 由于復雜度為 O(n_classes^2),這個方法通常比 one-vs-the-rest 慢。然而,這個方法也有優點,比如說是在沒有很好的縮放 n_samples 數據的核方法中。 這是由于每個單獨的學習問題只涉及一小部分數據,而 one-vs-the-rest 將會使用 n_classes 次完整的數據。OvO準確率會比OvR高。

3.1 固有的多類分類器

  • sklearn.linear_model.LogisticRegression (setting multi_class=”multinomial”)

相關multiclass參數選擇的help說明:

In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’,
and uses the cross-entropy loss if the ‘multi_class’ option is set to ‘multinomial’. (Currently the ‘multinomial’ option is supported only by the ‘lbfgs’, ‘sag’, ‘saga’ and ‘newton-cg’ solvers.)

multi_class : {‘auto’, ‘ovr’, ‘multinomial’}, default=‘auto’
If the option chosen is ‘ovr’, then a binary problem is fit for each label.
For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary.
‘multinomial’ is unavailable when solver=‘liblinear’.
‘auto’ selects ‘ovr’ if the data is binary, or if solver=‘liblinear’, and otherwise selects ‘multinomial’.

直接設置LogisticRegression的參數:multi_class='multinomial', solver='newton-cg',代碼如下:

def test1(X_train, X_test, y_train, y_test, multi_class='multinomial', solver='newton-cg'):log_reg = LogisticRegression(multi_class=multi_class, solver=solver) # 調用multinomial多分類,求解器 newton-cg or lbfgslog_reg.fit(X_train, y_train)predict_train = log_reg.predict(X_train)sys.stdout.write("LR(multi_class = %s, solver = %s) Train Accuracy : %.4g\n" % (multi_class, solver, metrics.accuracy_score(y_train, predict_train)))predict_test = log_reg.predict(X_test)sys.stdout.write("LR(multi_class = %s, solver = %s) Test Accuracy : %.4g\n" % (multi_class, solver, metrics.accuracy_score(y_test, predict_test)))plot_decision_boundary(4, 8.5, 1.5, 4.5, lambda x: log_reg.predict(x)) # 4個特征下注釋掉,前兩特征# plot_decision_boundary(0.5, 7.5, 0, 3, lambda x: log_reg.predict(x)) # 4個特征下注釋掉,后兩特征plot_data(X_train, y_train)

3.2 1對多的多類分類器

  • sklearn.linear_model.LogisticRegression (setting multi_class=”ovr”)

直接設置LogisticRegression的參數:multi_class='ovr', solver='liblinear'',代碼如下:

def test1(X_train, X_test, y_train, y_test, multi_class='ovr', solver='liblinear'):log_reg = LogisticRegression(multi_class=multi_class, solver=solver) # 調用ovr多分類,設置求解器 liblinearlog_reg.fit(X_train, y_train)predict_train = log_reg.predict(X_train)sys.stdout.write("LR(multi_class = %s, solver = %s) Train Accuracy : %.4g\n" % (multi_class, solver, metrics.accuracy_score(y_train, predict_train)))predict_test = log_reg.predict(X_test)sys.stdout.write("LR(multi_class = %s, solver = %s) Test Accuracy : %.4g\n" % (multi_class, solver, metrics.accuracy_score(y_test, predict_test)))plot_decision_boundary(4, 8.5, 1.5, 4.5, lambda x: log_reg.predict(x)) # 4個特征下注釋掉,前兩特征# plot_decision_boundary(0.5, 7.5, 0, 3, lambda x: log_reg.predict(x)) # 4個特征下注釋掉,后兩特征plot_data(X_train, y_train)

3.3 OneVsRestClassifier

class sklearn.multiclass.OneVsRestClassifier(estimator, n_jobs=None)

分類器接受一個評估器estimator對象,先定義一個LR模型log_reg,將log_reg傳入OvR分類器 ovr = OneVsRestClassifier(log_reg)

def test2(X_train, X_test, y_train, y_test):# multi_class默認auto# 'auto' selects 'ovr' if the data is binary, or if solver='liblinear',# and otherwise selects 'multinomial'.# 看完help知道auto選擇的是ovr,因為下面求解器選的是 liblinear# 所以test1和test2是同種效果,不一樣的寫法log_reg = LogisticRegression(solver='liblinear')ovr = OneVsRestClassifier(log_reg) # 傳入LR至OvR分類器ovr.fit(X_train, y_train)predict_train = ovr.predict(X_train)sys.stdout.write("LR(ovr) Train Accuracy : %.4g\n" % (metrics.accuracy_score(y_train, predict_train)))predict_test = ovr.predict(X_test)sys.stdout.write("LR(ovr) Test Accuracy : %.4g\n" % (metrics.accuracy_score(y_test, predict_test)))plot_decision_boundary(4, 8.5, 1.5, 4.5, lambda x: ovr.predict(x)) # 4個特征下注釋掉,前兩特征# plot_decision_boundary(0.5, 7.5, 0, 3, lambda x: ovr.predict(x)) # 4個特征下注釋掉,后兩特征plot_data(X_train, y_train)

3.4 OneVsOneClassifier

class sklearn.multiclass.OneVsOneClassifier(estimator, n_jobs=None)

分類器接受一個評估器estimator對象,先定義一個LR模型log_reg,將log_reg傳入OvO分類器 ovo = OneVsOneClassifier(log_reg)

def test3(X_train, X_test, y_train, y_test):# For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss;log_reg = LogisticRegression(multi_class='multinomial', solver='newton-cg')# ovo多分類,傳入LR(multinomial,newton-cg or lbfgs),測試時,選擇multi_class='ovr',結果一致,誰幫忙解釋下ovo = OneVsOneClassifier(log_reg) ovo.fit(X_train, y_train)predict_train = ovo.predict(X_train)sys.stdout.write("LR(ovo) Train Accuracy : %.4g\n" % (metrics.accuracy_score(y_train, predict_train)))predict_test = ovo.predict(X_test)sys.stdout.write("LR(ovo) Test Accuracy : %.4g\n" % (metrics.accuracy_score(y_test, predict_test)))plot_decision_boundary(4, 8.5, 1.5, 4.5, lambda x: ovr.predict(x)) # 4個特征下注釋掉,前兩特征# plot_decision_boundary(0.5, 7.5, 0, 3, lambda x: ovr.predict(x)) # 4個特征下注釋掉,后兩特征plot_data(X_train, y_train)

4. 結果分析

執行預測:

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=777) # 默認test比例0.25 test1(X_train, X_test, y_train, y_test, multi_class='ovr', solver='liblinear') test2(X_train, X_test, y_train, y_test) test1(X_train, X_test, y_train, y_test, multi_class='multinomial', solver='newton-cg') test3(X_train, X_test, y_train, y_test) 參數\分類模式dataLR(ovr, liblinear) 調用ovrOvR分類器(傳入LR(liblinear))LR(multinomial, newton-cg)OvO分類器(傳入LR(multinomial, newton-cg))
seed(520), 2 features [sepal, L,W]
準確率:train / test0.7679,0.84210.7679,0.84210.7768,0.89470.7768,0.8684
seed(777), 2 features [sepal, L,W]
準確率:train / test0.7589,0.73680.7589,0.73680.7768,0.81580.7946,0.8158
seed(520), 2 features [petal, L,W]
準確率:train / test0.8750,0.94740.8750,0.94740.9554,10.9554,1
seed(777), 2 features [petal, L,W]
準確率:train / test0.9196,0.94740.9196,0.94740.9554,10.9554,1
seed(520), 4 features-----
準確率:train / test0.9464,10.9464,10.9643,10.9732,1
seed(777), 4 features-----
準確率:train / test0.9464,10.9464,10.9643,10.9732,1
  • 前兩列是OvR模式的多分類,代碼寫法有區別,預測結果完全一樣
  • 后兩列是OvO模式的多分類(sklearn里沒有提供 LR 內置的'ovo'選項)
  • 對比兩種模式的多分類預測效果,OvO比OvR要好,但OvO是 O(n2)的復雜度
  • 在以sepal的長寬為特征的預測中,2維分類線可見setosa與剩余2類線性可分,剩余兩類之間線性不可分
  • 在以petal的長寬為特征的預測相比于sepal的兩個特征預測,petal的預測準確率高,由圖也可看出,分界線較好的區分了3個種類
  • 在使用4維特征下進行預測,訓練準確率OvO比OvR要好,測試準確率均達到100%,使用4維特征比使用2維特征預測,4維特征預測準確率更高

對于上面OvR,OvO分類器傳入的 LR 模型(里面的參數該怎么填寫),在上表的基礎上做了如下測試:(如果有大佬看見這里,請賜教!)

OvR分類器(傳入LR(ovr,liblinear)) 增加 OvR分類器(傳入LR(multinomial, newton-cg)) -------------------------------------------------OvO分類器(傳入LR(multinomial, newton-cg)) 增加 OvO分類器(傳入LR(ovr,liblinear)) 參數/準確率train, testLR(ovr, liblinear) 調用ovrOvR分類器(傳入LR(ovr,liblinear))OvR分類器(傳入LR(multinomial, newton-cg))LR(multinomial, newton-cg)OvO分類器(傳入LR(multinomial, newton-cg))OvO分類器(傳入LR(ovr,liblinear))
seed(520), 2 features [sepal, L,W]0.7679,0.84210.7679,0.84210.7857,0.89470.7768,0.89470.7768,0.86840.7500,07105
seed(777), 2 features [sepal, L,W]0.7589,0.73680.7589,0.73680.7589,0.81580.7768,0.81580.7946,0.81580.7232,0.7105
seed(520), 2 features [petal, L,W]0.8750,0.94740.8750,0.94740.9375,10.9554,10.9554,10.9464,1
seed(777), 2 features [petal, L,W]0.9196,0.94740.9196,0.94740.9464,10.9554,10.9554,10.9554,0.9737
seed(520), 4 features0.9464,10.9464,10.9464,10.9643,10.9732,10.9732,1
seed(777), 4 features0.9464,10.9464,10.9464,10.9643,10.9732,10.9821,0.9737
OvR分類器(傳入LR(ovr,liblinear)) 增加 OvR分類器(傳入LR(multinomial, newton-cg)) # OvR 該參數下效果更好 -------------------------------------------------OvO分類器(傳入LR(multinomial, newton-cg)) # OvO 該參數下效果更好 增加 OvO分類器(傳入LR(ovr,liblinear))

根據上面的數據,個人妄自推測:

  • 可能大部分情況下,OvR < OvO,LR(‘ovr’) < LR(‘multinomial’)
  • 搭配起來呢,所以同一OvR或者OvO下,傳入LR(‘multinomial’)預測結果準確率更高

這塊還請大佬指點迷津!!!

5. 附完整代碼

'''遇到不熟悉的庫、模塊、類、函數,可以依次:1)百度(google確實靠譜一些),如"matplotlib.pyplot",會有不錯的博客供學習參考2)"終端-->python-->import xx-->help(xx.yy)",一開始的時候這么做沒啥用,但作為資深工程師是必備技能3)試著修改一些參數,觀察其輸出的變化,在后面的程序中,會不斷的演示這種辦法 ''' # written by hitskyer, I just wanna say thank you ! # modified by Michael Ming on 2020.2.20 # Python 3.7 import sys import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn import metrics from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.multiclass import OneVsRestClassifier from sklearn.multiclass import OneVsOneClassifierdef show_data_set(X, y, data):plt.plot(X[y == 0, 0], X[y == 0, 1], 'rs', label=data.target_names[0])plt.plot(X[y == 1, 0], X[y == 1, 1], 'bx', label=data.target_names[1])plt.plot(X[y == 2, 0], X[y == 2, 1], 'go', label=data.target_names[2])plt.xlabel(data.feature_names[0])plt.ylabel(data.feature_names[1])plt.title("鳶尾花2維數據")plt.legend()plt.rcParams['font.sans-serif'] = 'SimHei' # 消除中文亂碼plt.show()def plot_data(X, y):plt.plot(X[y == 0, 0], X[y == 0, 1], 'rs', label='setosa')plt.plot(X[y == 1, 0], X[y == 1, 1], 'bx', label='versicolor')plt.plot(X[y == 2, 0], X[y == 2, 1], 'go', label='virginica')plt.xlabel("sepal length (cm)")plt.ylabel("sepal width (cm)")# plt.xlabel("petal length (cm)")# plt.ylabel("petal width (cm)")plt.title("預測分類邊界")plt.legend()plt.rcParams['font.sans-serif'] = 'SimHei' # 消除中文亂碼plt.show()def plot_decision_boundary(x_min, x_max, y_min, y_max, pred_func):h = 0.01xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))Z = pred_func(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)def test1(X_train, X_test, y_train, y_test, multi_class='ovr', solver='liblinear'):log_reg = LogisticRegression(multi_class=multi_class, solver=solver) # 調用ovr多分類log_reg.fit(X_train, y_train)predict_train = log_reg.predict(X_train)sys.stdout.write("LR(multi_class = %s, solver = %s) Train Accuracy : %.4g\n" % (multi_class, solver, metrics.accuracy_score(y_train, predict_train)))predict_test = log_reg.predict(X_test)sys.stdout.write("LR(multi_class = %s, solver = %s) Test Accuracy : %.4g\n" % (multi_class, solver, metrics.accuracy_score(y_test, predict_test)))plot_decision_boundary(4, 8.5, 1.5, 4.5, lambda x: log_reg.predict(x)) # 4個特征下注釋掉,前兩特征# plot_decision_boundary(0.5, 7.5, 0, 3, lambda x: log_reg.predict(x)) # 4個特征下注釋掉,后兩特征plot_data(X_train, y_train)def test2(X_train, X_test, y_train, y_test):# multi_class默認auto# 'auto' selects 'ovr' if the data is binary, or if solver='liblinear',# and otherwise selects 'multinomial'.# 看完help知道auto選擇的是ovr,因為下面求解器選的是 liblinear# 所以test1和test2是同種效果,不一樣的寫法log_reg = LogisticRegression(solver='liblinear')ovr = OneVsRestClassifier(log_reg)ovr.fit(X_train, y_train)predict_train = ovr.predict(X_train)sys.stdout.write("LR(ovr) Train Accuracy : %.4g\n" % (metrics.accuracy_score(y_train, predict_train)))predict_test = ovr.predict(X_test)sys.stdout.write("LR(ovr) Test Accuracy : %.4g\n" % (metrics.accuracy_score(y_test, predict_test)))plot_decision_boundary(4, 8.5, 1.5, 4.5, lambda x: ovr.predict(x)) # 4個特征下注釋掉,前兩特征# plot_decision_boundary(0.5, 7.5, 0, 3, lambda x: ovr.predict(x)) # 4個特征下注釋掉,后兩特征plot_data(X_train, y_train)def test3(X_train, X_test, y_train, y_test):# For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss;log_reg = LogisticRegression(multi_class='multinomial', solver='newton-cg')ovo = OneVsOneClassifier(log_reg) # ovo多分類,傳入LR(multinomial,newton-cg or lbfgs)ovo.fit(X_train, y_train)predict_train = ovo.predict(X_train)sys.stdout.write("LR(ovo) Train Accuracy : %.4g\n" % (metrics.accuracy_score(y_train, predict_train)))predict_test = ovo.predict(X_test)sys.stdout.write("LR(ovo) Test Accuracy : %.4g\n" % (metrics.accuracy_score(y_test, predict_test)))plot_decision_boundary(4, 8.5, 1.5, 4.5, lambda x: ovo.predict(x)) # 4個特征下注釋掉,前兩特征# plot_decision_boundary(0.5, 7.5, 0, 3, lambda x: ovo.predict(x)) # 4個特征下注釋掉,后兩特征plot_data(X_train, y_train)if __name__ == '__main__':iris = datasets.load_iris()# print(dir(iris)) # 查看data所具有的屬性或方法# print(iris.data) # 數據# print(iris.DESCR) # 數據描述X = iris.data[:, :2] # 取前2列特征sepal(平面只能展示2維)# X = iris.data[:, 2:4] # petal兩個特征# X = iris.data # 全部4個特征y = iris.target # 分類show_data_set(X, y, iris)X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=777) # 默認test比例0.25test1(X_train, X_test, y_train, y_test, multi_class='ovr', solver='liblinear')test2(X_train, X_test, y_train, y_test)test1(X_train, X_test, y_train, y_test, multi_class='multinomial', solver='newton-cg')test3(X_train, X_test, y_train, y_test)

總結

以上是生活随笔為你收集整理的基于sklearn的LogisticRegression鸢尾花多类分类实践的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。