當(dāng)前位置：首頁 >

【机器学习算法-python实现】K-means无监督学习实现分类

發(fā)布時(shí)間：2025/4/5 19 豆豆

生活随笔收集整理的這篇文章主要介紹了【机器学习算法-python实现】K-means无监督学习实现分类小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1.背景

? ? ? ? 無監(jiān)督學(xué)習(xí)的定義就不多說了，不懂得可以google。因?yàn)轫?xiàng)目需要，需要進(jìn)行無監(jiān)督的分類學(xué)習(xí)。 ? ? ? ? K-means里面的K指的是將數(shù)據(jù)分成的份數(shù)，基本上用的就是算距離的方法。 ? ? ? ? 大致的思路就是給定一個(gè)矩陣，假設(shè)K的值是2，也就是分成兩個(gè)部分，那么我們首先確定兩個(gè)質(zhì)心。一開始是找矩陣每一列的最大值max，最小值min，算出range=max-min，然后設(shè)質(zhì)心就是min+range*random。之后在逐漸遞歸跟進(jìn)，其實(shí)要想明白還是要跟一遍代碼，自己每一步都輸出一下看看跟自己想象的是否一樣。（順便吐槽一下，網(wǎng)上好多人在寫文章的事后拿了書上的代碼就粘貼上，也不管能不能用，博主改了一下午才改好。。。，各種bug）

2.代碼 ? ??

''' @author: hakuri ''' from numpy import * import matplotlib.pyplot as plt def loadDataSet(fileName): #general function to parse tab -delimited floatsdataMat = [] #assume last column is target valuefr = open(fileName)for line in fr.readlines():curLine = line.strip().split('\t')fltLine = map(float,curLine) #map all elements to float()dataMat.append(fltLine)return dataMatdef distEclud(vecA, vecB):return sqrt(sum(power(vecA - vecB, 2))) #la.norm(vecA-vecB)def randCent(dataSet, k):n = shape(dataSet)[1]centroids = mat(zeros((k,n)))#create centroid matfor j in range(n):#create random cluster centers, within bounds of each dimensionminJ = min(array(dataSet)[:,j])rangeJ = float(max(array(dataSet)[:,j]) - minJ)centroids[:,j] = mat(minJ + rangeJ * random.rand(k,1))return centroidsdef kMeans(dataSet, k, distMeas=distEclud, createCent=randCent):m = shape(dataSet)[0]clusterAssment = mat(zeros((m,2)))#create mat to assign data points #to a centroid, also holds SE of each pointcentroids = createCent(dataSet, k)clusterChanged = Truewhile clusterChanged:clusterChanged = Falsefor i in range(m):#for each data point assign it to the closest centroidminDist = inf; minIndex = -1for j in range(k):distJI = distMeas(array(centroids)[j,:],array(dataSet)[i,:])if distJI < minDist:minDist = distJI; minIndex = jif clusterAssment[i,0] != minIndex: clusterChanged = TrueclusterAssment[i,:] = minIndex,minDist**2print centroids # print nonzero(array(clusterAssment)[:,0]for cent in range(k):#recalculate centroidsptsInClust = dataSet[nonzero(array(clusterAssment)[:,0]==cent)[0][0]]#get all the point in this clustercentroids[cent,:] = mean(ptsInClust, axis=0) #assign centroid to mean id=nonzero(array(clusterAssment)[:,0]==cent)[0] return centroids, clusterAssment,iddef plotBestFit(dataSet,id,centroids): dataArr = array(dataSet)cent=array(centroids)n = shape(dataArr)[0] n1=shape(cent)[0]xcord1 = []; ycord1 = []xcord2 = []; ycord2 = []xcord3=[];ycord3=[]j=0for i in range(n):if j in id:xcord1.append(dataArr[i,0]); ycord1.append(dataArr[i,1])else:xcord2.append(dataArr[i,0]); ycord2.append(dataArr[i,1])j=j+1 for k in range(n1):xcord3.append(cent[k,0]);ycord3.append(cent[k,1]) fig = plt.figure()ax = fig.add_subplot(111)ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')ax.scatter(xcord2, ycord2, s=30, c='green')ax.scatter(xcord3, ycord3, s=50, c='black')plt.xlabel('X1'); plt.ylabel('X2');plt.show() if __name__=='__main__':dataSet=loadDataSet('/Users/hakuri/Desktop/testSet.txt') # # print randCent(dataSet,2) # print dataSet # # print kMeans(dataSet,2)a=[]b=[]a, b,id=kMeans(dataSet,2)plotBestFit(dataSet,id,a)
用的時(shí)候直接看最后的main，dataSet是數(shù)據(jù)集輸入，我會(huì)在下載地址提供給大家。 kmeans函數(shù)第一個(gè)參數(shù)是輸入矩陣、第二個(gè)是K的值，也就是分幾份。 plotBestFit是畫圖函數(shù)，需要加plot庫，而且目前只支持二維且K=2的情況。

3.效果圖

? ? ? 里面黑色的大點(diǎn)是兩個(gè)質(zhì)心，怎么樣，效果還可以吧！測(cè)試的時(shí)候一定要多用一點(diǎn)數(shù)據(jù)才會(huì)明顯。

4.下載地址

? ? ?我的github地址https://github.com/jimenbian，喜歡就點(diǎn)個(gè)starO(∩_∩)O哈！ ? ? ?點(diǎn)我下載

/********************************

* 本文來自博客 ?“李博Garvin“

* 轉(zhuǎn)載請(qǐng)標(biāo)明出處:http://blog.csdn.net/buptgshengod

******************************************/

總結(jié)

以上是生活随笔為你收集整理的【机器学习算法-python实现】K-means无监督学习实现分类的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：【机器学习算法-python实现】矩阵去
下一篇：【机器学习算法-python实现】采样算