日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

KNN简单实现

發(fā)布時(shí)間:2024/9/20 编程问答 24 豆豆
生活随笔 收集整理的這篇文章主要介紹了 KNN简单实现 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

最近開始學(xué)習(xí)機(jī)器學(xué)習(xí)實(shí)戰(zhàn),第一個(gè)就是KNN,由于K-近鄰算法比較簡(jiǎn)單,這里不再介紹理論知識(shí),直接看代碼實(shí)現(xiàn):

KNN的簡(jiǎn)單實(shí)現(xiàn)

需要用到的一些語法:
tile()
sum(axis=1)
argsort,sort 和 sorted,operator.itemgetter函數(shù)
get(),items(),iteritems()方法

# coding=utf-8 from numpy import * import operator # 運(yùn)算符模塊,執(zhí)行排序操作時(shí)將用到 import matplotlib.pyplot as plt# 建立訓(xùn)訓(xùn)練集和相應(yīng)的標(biāo)簽 def createDataset():# 數(shù)組,注意此處是兩個(gè)中括號(hào)group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])labels=['A','A','B','B']return (group,labels)# 簡(jiǎn)單分類 def classify0(inX,dataSet,labels,k):#shape[0]得到的是矩陣行數(shù),shape[1]得到列數(shù)dataSetSize=dataSet.shape[0] # tile()得到和dataset相同的維數(shù),進(jìn)行相減diffMat=tile(inX,(dataSetSize,1))-dataSet #print(diffMat)# 各向量相減后平方sqDiffMat = diffMat**2#print(sqDiffMat)# axis=1按行求和,得到了平方和sqDistances = sqDiffMat.sum(axis=1)#print(sqDistances)# 開根號(hào),求得輸入向量和訓(xùn)練集各向量的歐氏距離distances = sqDistances**0.5#print(distances)# 得到各距離索引值,是升序,即最小距離到最大距離sortedDistIndicies = distances.argsort()#print( sortedDistIndicies)classCount={} for i in range(k):# 前k個(gè)最小距離的標(biāo)簽voteIlabel = labels[sortedDistIndicies[i]]#print( voteIlabel)# 累計(jì)投票數(shù)classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1print('classCount:',classCount)# 把分類結(jié)果進(jìn)行排序,然后返回得票數(shù)最多的分類結(jié)果# 其中iteritems()把字典分解為元祖列表,itemgetter(1)按照第二個(gè)元素的次序?qū)υ媾判?/span>sortedClassCount = sorted(classCount.iteritems(), \key=operator.itemgetter(1), reverse=True)print(sortedClassCount)# 輸出分類標(biāo)簽#print(sortedClassCount[0][0]) return sortedClassCount[0][0]# 讀的是datingTestSet2.txt,不是datingTestSet.txt file_raw='C:\Users\LiLong\Desktop\datingTestSet2.txt' if __name__== "__main__": # 導(dǎo)入數(shù)據(jù)group,labels=createDataset()print('training data set:',group)print('labels of training data set:',labels)# 簡(jiǎn)單分類 tt=classify0([0,0],group,labels,3)print('Classification results:',tt)

運(yùn)行結(jié)果:

('training data set:', array([[ 1. , 1.1],[ 1. , 1. ],[ 0. , 0. ],[ 0. , 0.1]])) ('labels of training data set:', ['A', 'A', 'B', 'B']) ('classCount:', {'A': 1, 'B': 2}) [('B', 2), ('A', 1)] ('Classification results:', 'B')

至此一個(gè)最簡(jiǎn)單的KNN分類就實(shí)現(xiàn)了

KNN算法改進(jìn)約會(huì)網(wǎng)站的配對(duì)效果

數(shù)據(jù)的處理

會(huì)用到的語法:
matplotlib
min(iterable, *[, key, default])

# coding=utf-8 from numpy import * import operator # 運(yùn)算符模塊,執(zhí)行排序操作時(shí)將用到 import matplotlib.pyplot as plt# 數(shù)據(jù)預(yù)處理 def file2matrix(filename):'''從文件中讀入訓(xùn)練數(shù)據(jù),并存儲(chǔ)為矩陣'''fr=open(filename,'r')# 源代碼有錯(cuò)誤arrayOfLines=fr.readlines() # 只能讀一次numberOfLines = len(arrayOfLines) # 得到樣本的行數(shù)# 得到一個(gè)二維矩陣,行數(shù)是樣本的行數(shù),每行3列returnMat = zeros((numberOfLines,3)) print('row:%s and column:%s' %(returnMat.shape[0],returnMat.shape[1]))classLabelVector = [] # 得到一個(gè)一維的數(shù)組,存放樣本標(biāo)簽index = 0for line in arrayOfLines:#strip() 方法用于移除字符串頭尾指定的字符(默認(rèn)為所有的空字符,包括空格、換行(\n)、制表符(\t)等)line = line.strip() # 把回車符號(hào)給去掉#對(duì)于每一行,按照制表符切割字符串,得到的結(jié)果構(gòu)成一個(gè)數(shù)組,listFromLine = line.split('\t')#print(listFromLine[0:4])# 把分割好的數(shù)據(jù)放至數(shù)據(jù)集,是一個(gè)1000*3的數(shù)組returnMat[index,:] = listFromLine[0:3] classLabelVector.append(int(listFromLine[-1]))index += 1return ( returnMat,classLabelVector) fr.close()# 歸一化數(shù)據(jù) def autoNorm(dataSet):# 每列的最小值minvalsminVals=dataSet.min(0) # 0表示返回每列的最小值maxVals=dataSet.max(0)ranges=maxVals-minVals# 得到dataset相同行列數(shù)的0數(shù)組normDataSet=zeros(shape(dataSet))m = dataSet.shape[0] #數(shù)組的行數(shù)# tile復(fù)制形如[A,B,C](ABC分別代表每列的最小值)m行normDataSet = dataSet - tile(minVals, (m,1)) # 歸一化公式,注意是具體特征值相除normDataSet = normDataSet/tile(ranges, (m,1)) #element wise dividereturn normDataSet, ranges, minVals# 分類測(cè)試 def datingClassTest():hoRatio = 0.10 datingDataMat,datingLabels = file2matrix('C:\Users\LiLong\Desktop\datingTestSet2.txt')normMat, ranges, minVals = autoNorm(datingDataMat)m = normMat.shape[0]# 測(cè)試數(shù)據(jù)的數(shù)量numTestVecs = int(m*hoRatio)print('the test number:',numTestVecs)errorCount = 0.0for i in range(numTestVecs):#normMat[i,:]表示輸入的測(cè)試集是前100行的數(shù)據(jù),normMat[numTestVecs:m,:]表示訓(xùn)練集#是100-1000的,datingLabels[numTestVecs:m]表示和訓(xùn)練集是對(duì)應(yīng)的classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],\datingLabels[numTestVecs:m],3)print ("the classifier came back with: %d, the real answer is: %d"\% (classifierResult, datingLabels[i]))if (classifierResult != datingLabels[i]): errorCount += 1.0print "the total error rate is: %f" % (errorCount/float(numTestVecs))print errorCount# 讀的是datingTestSet2.txt,不是datingTestSet.txt file_raw='C:\Users\LiLong\Desktop\datingTestSet2.txt' if __name__== "__main__":# 格式化數(shù)據(jù)datingDataMat,datingLables=file2matrix(file_raw)print datingDataMatprint datingLables#print(array(datingLables)) # 以數(shù)組的部分省略形式顯示# 創(chuàng)建散點(diǎn)圖fig=plt.figure()ax=fig.add_subplot(111)ax.scatter(datingDataMat[:,1],datingDataMat[:,2])# c是顏色的數(shù)目,s是尺寸ax.scatter(datingDataMat[:,1],datingDataMat[:,2],\ c=15.0*array(datingLables),s=15.0*array(datingLables))plt.show()# 數(shù)據(jù)歸一化normMat, ranges, minVals=autoNorm(datingDataMat) print normMat

其中file2matrix得到的是數(shù)組矩陣,也即是可以處理的數(shù)據(jù)格式,如下:

[[ 4.09200000e+04 8.32697600e+00 9.53952000e-01][ 1.44880000e+04 7.15346900e+00 1.67390400e+00][ 2.60520000e+04 1.44187100e+00 8.05124000e-01]..., [ 2.65750000e+04 1.06501020e+01 8.66627000e-01][ 4.81110000e+04 9.13452800e+00 7.28045000e-01][ 4.37570000e+04 7.88260100e+00 1.33244600e+00]][3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3, 2, 1, 2, 3, 2, 3, 2, 3, 2, 1, 3, 1, 3, 1, 2, 1, 1, 2, 3, 3, 1, 2, 3, 3, 3, 1, 1, 1, 1, 2, 2, 1, 3, 2, 2, 2, 2, 3, 1, 2, 1, 2, 2, 2, 2, 2, 3, 2, 3, 1, 2, 3, 2, 2, 1, 3, 1, 1, 3, 3, 1, 2, 3, 1, 3, 1, 2, 2, 1, 1, 3, 3, 1, 2, 1, 3, 3, 2, 1, 1, 3, 1, 2, 3, 3, 2, 3, 3, 1, 2, 3, 2, 1, 3, 1, 2, 1, 1, 2, 3, 2, 3, 2, 3, 2, 1, 3, 3, 3, 1, 3, 2, 2, 3, 1, 3, 3, 3, 1, 3, 1, 1, 3, 3, 2, 3, 3, 1, 2, 3, 2, 2, 3, 3, 3, 1, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 3, 1, 2, 3, 2, 3, 1, 1, 1, 3, 2, 3, 1, 3, 2, 1, 3, 2, 2, 3, 2, 3, 2, 1, 1, 3, 1, 3, 2, 2, 2, 3, 2, 2, 1, 2, 2, 3, 1, 3, 3, 2, 1, 1, 1, 2, 1, 3, 3, 3, 3, 2, 1, 1, 1, 2, 3, 2, 1, 3, 1, 3, 2, 2, 3, 1, 3, 1, 1, 2, 1, 2, 2, 1, 3, 1, 3, 2, 3, 1, 2, 3, 1, 1, 1, 1, 2, 3, 2, 2, 3, 1, 2, 1, 1, 1, 3, 3, 2, 1, 1, 1, 2, 2, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 2, 3, 2, 3, 3, 3, 3, 1, 2, 3, 1, 1, 1, 3, 1, 3, 2, 2, 1, 3, 1, 3, 2, 2, 1, 2, 2, 3, 1, 3, 2, 1, 1, 3, 3, 2, 3, 3, 2, 3, 1, 3, 1, 3, 3, 1, 3, 2, 1, 3, 1, 3, 2, 1, 2, 2, 1, 3, 1, 1, 3, 3, 2, 2, 3, 1, 2, 3, 3, 2, 2, 1, 1, 1, 1, 3, 2, 1, 1, 3, 2, 1, 1, 3, 3, 3, 2, 3, 2, 1, 1, 1, 1, 1, 3, 2, 2, 1, 2, 1, 3, 2, 1, 3, 2, 1, 3, 1, 1, 3, 3, 3, 3, 2, 1, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 1, 2, 2, 2, 1, 1, 3, 1, 1, 2, 3, 1, 1, 2, 3, 1, 3, 1, 1, 2, 2, 1, 2, 2, 2, 3, 1, 1, 1, 3, 1, 3, 1, 3, 3, 1, 1, 1, 3, 2, 3, 3, 2, 2, 1, 1, 1, 2, 1, 2, 2, 3, 3, 3, 1, 1, 3, 3, 2, 3, 3, 2, 3, 3, 3, 2, 3, 3, 1, 2, 3, 2, 1, 1, 1, 1, 3, 3, 3, 3, 2, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 2, 3, 2, 1, 2, 2, 2, 3, 2, 1, 3, 2, 3, 2, 3, 2, 1, 1, 2, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 3, 2, 1, 3, 3, 2, 2, 2, 3, 1, 2, 1, 1, 3, 2, 3, 2, 3, 2, 3, 3, 2, 2, 1, 3, 1, 2, 1, 3, 1, 1, 1, 3, 1, 1, 3, 3, 2, 2, 1, 3, 1, 1, 3, 2, 3, 1, 1, 3, 1, 3, 3, 1, 2, 3, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 1, 2, 1, 3, 1, 2, 1, 3, 1, 3, 1, 1, 2, 2, 2, 3, 2, 2, 1, 2, 3, 3, 2, 3, 3, 3, 2, 3, 3, 1, 3, 2, 3, 2, 1, 2, 1, 1, 1, 2, 3, 2, 2, 1, 2, 2, 1, 3, 1, 3, 3, 3, 2, 2, 3, 3, 1, 2, 2, 2, 3, 1, 2, 1, 3, 1, 2, 3, 1, 1, 1, 2, 2, 3, 1, 3, 1, 1, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 1, 3, 1, 2, 3, 2, 2, 3, 1, 2, 3, 2, 3, 1, 2, 2, 3, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 3, 2, 1, 3, 3, 3, 1, 1, 3, 1, 2, 3, 3, 2, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 2, 3, 3, 2, 1, 3, 2, 1, 3, 3, 1, 2, 3, 2, 1, 3, 3, 3, 1, 2, 2, 2, 3, 2, 3, 3, 1, 2, 1, 1, 2, 1, 3, 1, 2, 2, 1, 3, 2, 1, 3, 3, 2, 2, 2, 1, 2, 2, 1, 3, 1, 3, 1, 3, 3, 1, 1, 2, 3, 2, 2, 3, 1, 1, 1, 1, 3, 2, 2, 1, 3, 1, 2, 3, 1, 3, 1, 3, 1, 1, 3, 2, 3, 1, 1, 3, 3, 3, 3, 1, 3, 2, 2, 1, 1, 3, 3, 2, 2, 2, 1, 2, 1, 2, 1, 3, 2, 1, 2, 2, 3, 1, 2, 2, 2, 3, 2, 1, 2, 1, 2, 3, 3, 2, 3, 1, 1, 3, 3, 1, 2, 2, 2, 2, 2, 2, 1, 3, 3, 3, 3, 3, 1, 1, 3, 2, 1, 2, 1, 2, 2, 3, 2, 2, 2, 3, 1, 2, 1, 2, 2, 1, 1, 2, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 1, 3, 3, 2, 3, 2, 3, 3, 2, 2, 1, 1, 1, 3, 3, 1, 1, 1, 3, 3, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 3, 1, 1, 2, 3, 2, 2, 1, 3, 1, 2, 3, 1, 2, 2, 2, 2, 3, 2, 3, 3, 1, 2, 1, 2, 3, 1, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 1, 3, 3, 3]

下圖是數(shù)據(jù)的散點(diǎn)圖:

歸一化后的數(shù)據(jù):

[[ 0.44832535 0.39805139 0.56233353][ 0.15873259 0.34195467 0.98724416][ 0.28542943 0.06892523 0.47449629]..., [ 0.29115949 0.50910294 0.51079493][ 0.52711097 0.43665451 0.4290048 ][ 0.47940793 0.3768091 0.78571804]]

測(cè)試算法

# coding=utf-8 from numpy import * import operator # 運(yùn)算符模塊,執(zhí)行排序操作時(shí)將用到 import matplotlib.pyplot as plt# 建立訓(xùn)訓(xùn)練集和相應(yīng)的標(biāo)簽 def createDataset():# 數(shù)組,注意此處是兩個(gè)中括號(hào)group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])labels=['A','A','B','B']return (group,labels)# 簡(jiǎn)單分類 def classify0(inX,dataSet,labels,k):#shape[0]得到的是矩陣行數(shù),shape[1]得到列數(shù)dataSetSize=dataSet.shape[0] # tile()得到和dataset相同的維數(shù),進(jìn)行相減diffMat=tile(inX,(dataSetSize,1))-dataSet # 各向量相減后平方sqDiffMat = diffMat**2# axis=1按行求和,得到了平方和sqDistances = sqDiffMat.sum(axis=1)# 開根號(hào),求得輸入向量和訓(xùn)練集各向量的歐氏距離distances = sqDistances**0.5# 得到各距離索引值,是升序,即最小距離到最大距離sortedDistIndicies = distances.argsort()classCount={} # 定義一個(gè)字典for i in range(k):# 前k個(gè)最小距離的標(biāo)簽voteIlabel = labels[sortedDistIndicies[i]] # 累計(jì)投票數(shù)classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1# 把分類結(jié)果進(jìn)行排序,然后返回得票數(shù)最多的分類結(jié)果# 其中iteritems()把字典分解為元祖列表,itemgetter(1)按照第二個(gè)元素的次序?qū)υ媾判?/span>sortedClassCount = sorted(classCount.iteritems(), \key=operator.itemgetter(1), reverse=True)# 輸出分類標(biāo)簽#print(sortedClassCount[0][0]) return sortedClassCount[0][0]# 數(shù)據(jù)預(yù)處理 def file2matrix(filename):'''從文件中讀入訓(xùn)練數(shù)據(jù),并存儲(chǔ)為矩陣'''fr=open(filename,'r')# 源代碼有錯(cuò)誤arrayOfLines=fr.readlines() # 只能讀一次numberOfLines = len(arrayOfLines) # 得到樣本的行數(shù)returnMat = zeros((numberOfLines,3)) # 得到一個(gè)二維矩陣,行數(shù)是樣本的行數(shù),每行3列print('row:%s and column:%s' %(returnMat.shape[0],returnMat.shape[1]))classLabelVector = [] # 得到一個(gè)一維的數(shù)組,存放樣本標(biāo)簽index = 0for line in arrayOfLines:#strip() 方法用于移除字符串頭尾指定的字符(默認(rèn)為所有的空字符,包括空格、換行(\n)、制表符(\t)等)line = line.strip() # 把回車符號(hào)給去掉#對(duì)于每一行,按照制表符切割字符串,得到的結(jié)果構(gòu)成一個(gè)數(shù)組,listFromLine = line.split('\t')#print(listFromLine[0:4])# 把分割好的數(shù)據(jù)放至數(shù)據(jù)集,是一個(gè)1000*3的數(shù)組returnMat[index,:] = listFromLine[0:3] classLabelVector.append(int(listFromLine[-1]))index += 1return ( returnMat,classLabelVector) fr.close()# 歸一化數(shù)據(jù) def autoNorm(dataSet):# 每列的最小值minvalsminVals=dataSet.min(0) # 0表示返回每列的最小值maxVals=dataSet.max(0)ranges=maxVals-minVals# 得到dataset相同行列數(shù)的0數(shù)組normDataSet=zeros(shape(dataSet))m = dataSet.shape[0] #數(shù)組的行數(shù)# tile復(fù)制形如[A,B,C](ABC分別代表每列的最小值)m行normDataSet = dataSet - tile(minVals, (m,1)) # 歸一化公式,注意是具體特征值相除normDataSet = normDataSet/tile(ranges, (m,1)) #element wise dividereturn normDataSet, ranges, minVals# 分類測(cè)試 def datingClassTest():hoRatio = 0.10 #hold out 10%datingDataMat,datingLabels = file2matrix('C:\Users\LiLong\Desktop\datingTestSet2.txt')normMat, ranges, minVals = autoNorm(datingDataMat)m = normMat.shape[0]# 測(cè)試數(shù)據(jù)的數(shù)量numTestVecs = int(m*hoRatio)print('the test number:',numTestVecs)errorCount = 0.0for i in range(numTestVecs):#normMat[i,:]表示輸入的測(cè)試集是前100行的數(shù)據(jù),normMat[numTestVecs:m,:]表示訓(xùn)練集#是100-1000的,datingLabels[numTestVecs:m]表示和訓(xùn)練集是對(duì)應(yīng)的classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],\datingLabels[numTestVecs:m],3)print ("the classifier came back with: %d, the real answer is: %d"\% (classifierResult, datingLabels[i]))if (classifierResult != datingLabels[i]): errorCount += 1.0print "the total error rate is: %f" % (errorCount/float(numTestVecs))print errorCount# 讀的是datingTestSet2.txt,不是datingTestSet.txt #file_raw='C:\Users\LiLong\Desktop\datingTestSet2.txt' if __name__== "__main__": datingClassTest()

結(jié)果:

row:1000 and column:3 ('the test number:', 100) the classifier came back with: 3, the real answer is: 3 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 1, the real answer is: 1 ..., the classifier came back with: 2, the real answer is: 2 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 1, the real answer is: 1 the classifier came back with: 3, the real answer is: 1 the total error rate is: 0.050000 5.0

結(jié)果顯示錯(cuò)誤率5.0%

與50位技術(shù)專家面對(duì)面20年技術(shù)見證,附贈(zèng)技術(shù)全景圖

總結(jié)

以上是生活随笔為你收集整理的KNN简单实现的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。