當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习模型评分总结（sklearn）

發布時間：2023/12/13 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习模型评分总结（sklearn）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

目錄
- 模型評估
- 評價指標
- 1.分類評價指標
- - acc、recall、F1、混淆矩陣、分類綜合報告
  - - 1.準確率
    - - 方式一：accuracy_score
      - 方式二：metrics
    - 2.召回率
    - 3.F1分數
    - 4.混淆矩陣
    - 5.分類報告
    - 6.kappa score
  - ROC
  - - 1.ROC計算
    - 2.ROC曲線
    - 3.具體實例
- 2.回歸評價指標
- 3.聚類評價指標
- - 1.Adjusted Rand index 調整蘭德系數
  - 2.Mutual Information based scores 互信息
  - 3.Homogeneity, completeness and V-measure
  - 4.Fowlkes-Mallows scores
  - 5.Silhouette Coefficient 輪廓系數
  - 6.Calinski-Harabaz Index
- 4.其他

estimator的score方法：sklearn中的estimator都具有一個score方法，它提供了一個缺省的評估法則來解決問題。
Scoring參數：使用cross-validation的模型評估工具，依賴于內部的scoring策略。見下。
通過測試集上評估預測誤差：sklearn Metric函數用來評估預測誤差。

評價指標

評價指標針對不同的機器學習任務有不同的指標，同一任務也有不同側重點的評價指標。
主要有分類（classification）、回歸（regression）、排序（ranking）、聚類（clustering）、熱門主題模型（topic modeling）、推薦（recommendation）等。

1.分類評價指標

acc、recall、F1、混淆矩陣、分類綜合報告

1.準確率

方式一：accuracy_score

# 準確率 import numpy as np from sklearn.metrics import accuracy_score y_pred = [0, 2, 1, 3,9,9,8,5,8] y_true = [0, 1, 2, 3,2,6,3,5,9]accuracy_score(y_true, y_pred) Out[127]: 0.33333333333333331accuracy_score(y_true, y_pred, normalize=False) # 類似海明距離，每個類別求準確后，再求微平均 Out[128]: 3

方式二：metrics

宏平均比微平均更合理，但也不是說微平均一無是處，具體使用哪種評測機制，還是要取決于數據集中樣本分布

宏平均（Macro-averaging），是先對每一個類統計指標值，然后在對所有類求算術平均值。
微平均（Micro-averaging），是對數據集中的每一個實例不分類別進行統計建立全局混淆矩陣，然后計算相應指標。參考博客

from sklearn import metrics metrics.precision_score(y_true, y_pred, average='micro') # 微平均，精確率 Out[130]: 0.33333333333333331metrics.precision_score(y_true, y_pred, average='macro') # 宏平均，精確率 Out[131]: 0.375metrics.precision_score(y_true, y_pred, labels=[0, 1, 2, 3], average='macro') # 指定特定分類標簽的精確率 Out[133]: 0.5

其中average參數有五種：(None, ‘micro’, ‘macro’, ‘weighted’, ‘samples’)

2.召回率

metrics.recall_score(y_true, y_pred, average='micro') Out[134]: 0.33333333333333331metrics.recall_score(y_true, y_pred, average='macro') Out[135]: 0.3125

3.F1分數

metrics.f1_score(y_true, y_pred, average='weighted') Out[136]: 0.37037037037037035

4.混淆矩陣

# 混淆矩陣 from sklearn.metrics import confusion_matrix confusion_matrix(y_true, y_pred)Out[137]: array([[1, 0, 0, ..., 0, 0, 0],[0, 0, 1, ..., 0, 0, 0],[0, 1, 0, ..., 0, 0, 1],..., [0, 0, 0, ..., 0, 0, 1],[0, 0, 0, ..., 0, 0, 0],[0, 0, 0, ..., 0, 1, 0]])

橫為true label 豎為predict

5.分類報告

# 分類報告：precision/recall/fi-score/均值/分類個數from sklearn.metrics import classification_reporty_true = [0, 1, 2, 2, 0]y_pred = [0, 0, 2, 2, 0]target_names = ['class 0', 'class 1', 'class 2']print(classification_report(y_true, y_pred, target_names=target_names))

包含：precision/recall/fi-score/均值/分類個數

6.kappa score

kappa score是一個介于(-1, 1)之間的數. score>0.8意味著好的分類；0或更低意味著不好（實際是隨機標簽）

from sklearn.metrics import cohen_kappa_scorey_true = [2, 0, 2, 2, 0, 1]y_pred = [0, 0, 2, 2, 0, 2]cohen_kappa_score(y_true, y_pred)

ROC

1.ROC計算

import numpy as npfrom sklearn.metrics import roc_auc_scorey_true = np.array([0, 0, 1, 1])y_scores = np.array([0.1, 0.4, 0.35, 0.8])roc_auc_score(y_true, y_scores)

2.ROC曲線

y = np.array([1, 1, 2, 2])scores = np.array([0.1, 0.4, 0.35, 0.8])fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2)

3.具體實例

import numpy as np import matplotlib.pyplot as plt from itertools import cyclefrom sklearn import svm, datasets from sklearn.metrics import roc_curve, auc from sklearn.model_selection import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier from scipy import interp# Import some data to play with iris = datasets.load_iris() X = iris.data y = iris.target# 畫圖 all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))# Then interpolate all ROC curves at this points mean_tpr = np.zeros_like(all_fpr) for i in range(n_classes):mean_tpr += interp(all_fpr, fpr[i], tpr[i])# Finally average it and compute AUC mean_tpr /= n_classesfpr["macro"] = all_fpr tpr["macro"] = mean_tpr roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])# Plot all ROC curves plt.figure() plt.plot(fpr["micro"], tpr["micro"],label='micro-average ROC curve (area = {0:0.2f})'''.format(roc_auc["micro"]),color='deeppink', linestyle=':', linewidth=4)plt.plot(fpr["macro"], tpr["macro"],label='macro-average ROC curve (area = {0:0.2f})'''.format(roc_auc["macro"]),color='navy', linestyle=':', linewidth=4)colors = cycle(['aqua', 'darkorange', 'cornflowerblue']) for i, color in zip(range(n_classes), colors):plt.plot(fpr[i], tpr[i], color=color, lw=lw,label='ROC curve of class {0} (area = {1:0.2f})'''.format(i, roc_auc[i]))plt.plot([0, 1], [0, 1], 'k--', lw=lw) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Some extension of Receiver operating characteristic to multi-class') plt.legend(loc="lower right") plt.show()

2.回歸評價指標

回歸是對連續的實數值進行預測，而分類中是離散值。

3.聚類評價指標

參考博客

1.Adjusted Rand index 調整蘭德系數

>>> from sklearn import metrics >>> labels_true = [0, 0, 0, 1, 1, 1] >>> labels_pred = [0, 0, 1, 1, 2, 2]>>> metrics.adjusted_rand_score(labels_true, labels_pred)

2.Mutual Information based scores 互信息

>>> from sklearn import metrics >>> labels_true = [0, 0, 0, 1, 1, 1] >>> labels_pred = [0, 0, 1, 1, 2, 2]>>> metrics.adjusted_mutual_info_score(labels_true, labels_pred) 0.22504

3.Homogeneity, completeness and V-measure

同質性homogeneity：每個群集只包含單個類的成員。
完整性completeness：給定類的所有成員都分配給同一個群集。

>>> from sklearn import metrics >>> labels_true = [0, 0, 0, 1, 1, 1] >>> labels_pred = [0, 0, 1, 1, 2, 2]>>> metrics.homogeneity_score(labels_true, labels_pred) 0.66...>>> metrics.completeness_score(labels_true, labels_pred)

4.Fowlkes-Mallows scores

5.Silhouette Coefficient 輪廓系數

6.Calinski-Harabaz Index

4.其他

總結

以上是生活随笔為你收集整理的机器学习模型评分总结（sklearn）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： MachineLearning(7)-决
下一篇： LeetCode - Easy - 19