當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习——KNN实现

發布時間：2025/3/19 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习——KNN实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、KNN（K近鄰）概述

KNN一種基于距離的計算的分類和回歸的方法。

其主要過程為：

計算訓練樣本和測試樣本中每個樣本點的距離（常見的距離度量有歐式距離，馬氏距離等）；

對上面所有的距離值進行排序(升序)；

選前k個最小距離的樣本；

根據這k個樣本的標簽進行投票，得到最后的分類類別；

優點：

理論成熟，思想簡單，既可以用來做分類也可以用來做回歸；

可用于非線性分類；

訓練時間復雜度為O(n)；

對數據沒有假設，準確度高，對異常值不敏感；

缺點：

計算量大（體現在距離計算上）；

樣本不平衡問題（即有些類別的樣本數量很多，而其它樣本的數量很少）效果差；

需要大量內存；

二、實現——sklearn

1、sklearn.neighbors

與近鄰法這一大類相關的類庫都在sklearn.neighbors包之中。

KNN分類樹的類是KNeighborsClassifier，KNN回歸樹的類是KNeighborsRegressor。

除此之外，還有KNN的擴展，即限定半徑最近鄰分類樹的類RadiusNeighborsClassifier和限定半徑最近鄰回歸樹的類RadiusNeighborsRegressor，以及最近質心分類算法NearestCentroid。

2、KNN分類的實現

（1）數據的隨機生成

import numpy as np import matplotlib.pyplot as plt %matplotlib inline from sklearn.datasets.samples_generator import make_classification # X為樣本特征，Y為樣本類別輸出，共1000個樣本，每個樣本2個特征，輸出有3個類別，沒有冗余特征，每個類別一個簇 X, Y = make_classification(n_samples=1000, n_features=2, n_redundant=0,n_clusters_per_class=1, n_classes=3) plt.scatter(X[:, 0], X[:, 1], marker='o', c=Y) plt.show()

結果如下圖所示：

make_classification 函數

from sklearn.datasets.samples_generator import make_classification sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True,shift=0.0, scale=1.0, shuffle=True, random_state=None)

通常用于分類算法。
n_features :特征個數= n_informative（） + n_redundant + n_repeated
n_informative：多信息特征的個數
n_redundant：冗余信息，informative特征的隨機線性組合
n_repeated ：重復信息，隨機提取n_informative和n_redundant 特征
n_classes：分類類別
n_clusters_per_class ：某一個類別是由幾個cluster構成的

（2）模型的擬合

用KNN來擬合模型，我們選擇K=15，權重為距離遠近。代碼如下：

from sklearn import neighbors clf = neighbors.KNeighborsClassifier(n_neighbors = 15 , weights='distance') clf.fit(X, Y)

（3）模型的預測

from matplotlib.colors import ListedColormap cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF']) cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])#確認訓練集的邊界 x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 #生成隨機數據來做測試集，然后作預測 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),np.arange(y_min, y_max, 0.02)) Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])# 畫出測試集數據 Z = Z.reshape(xx.shape) plt.figure() plt.pcolormesh(xx, yy, Z, cmap=cmap_light)# 也畫出所有的訓練集數據 plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=cmap_bold) plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.title("3-Class classification (k = 15, weights = 'distance')" )

結果如下圖所示：

三、KNN源碼

def classify0(inX, dataSet, labels, k): dataSetSize = dataSet.shape[0] # the number of samples # tile function is the same as "replicate" function of MATLAB # 這個技巧就避免了循環語句 diffMat = tile(inX, (dataSetSize, 1)) - dataSet # replicate inX into dataSetSize * 1 sqDiffMat = diffMat**2 # 對應元素平方 sqDistances = sqDiffMat.sum(axis = 1) # 按行求和 distances = sqDistances**0.5 # 開方求距離 sortedDistIndicies = distances.argsort() # argsort函數返回的是數組值從小到大的索引值 classCount = {} # 投票 for i in range(k): voteIlabel = labels[sortedDistIndicies[i]] #排名第i近的樣本的label classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1 #get字典的元素，如果不存在key，則為0 # operator.itemgetter(1)按照value排序；也可以用 key = lambda asd:asd[1] # 排序完，原classCount不變 sortedClassCount = sorted(classCount.iteritems(), # 鍵值對 key = operator.itemgetter(1), reverse = True) #逆序排列 return sortedClassCount[0][0] #輸出第一個，也就是最近鄰

總結

以上是生活随笔為你收集整理的机器学习——KNN实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

机器
KNN

上一篇： OpenCV结合socket进行实时视频
下一篇： np.newaxis——np.ndarr