當(dāng)前位置:
首頁 >
【机器学习算法-python实现】K-means无监督学习实现分类
發(fā)布時(shí)間:2025/4/5
19
豆豆
生活随笔
收集整理的這篇文章主要介紹了
【机器学习算法-python实现】K-means无监督学习实现分类
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
1.背景
? ? ? ? 無監(jiān)督學(xué)習(xí)的定義就不多說了,不懂得可以google。因?yàn)轫?xiàng)目需要,需要進(jìn)行無監(jiān)督的分類學(xué)習(xí)。 ? ? ? ? K-means里面的K指的是將數(shù)據(jù)分成的份數(shù),基本上用的就是算距離的方法。 ? ? ? ? 大致的思路就是給定一個(gè)矩陣,假設(shè)K的值是2,也就是分成兩個(gè)部分,那么我們首先確定兩個(gè)質(zhì)心。一開始是找矩陣每一列的最大值max,最小值min,算出range=max-min,然后設(shè)質(zhì)心就是min+range*random。之后在逐漸遞歸跟進(jìn),其實(shí)要想明白還是要跟一遍代碼,自己每一步都輸出一下看看跟自己想象的是否一樣。 (順便吐槽一下,網(wǎng)上好多人在寫文章的事后拿了書上的代碼就粘貼上,也不管能不能用,博主改了一下午才改好。。。,各種bug)2.代碼 ? ??
''' @author: hakuri ''' from numpy import * import matplotlib.pyplot as plt def loadDataSet(fileName): #general function to parse tab -delimited floatsdataMat = [] #assume last column is target valuefr = open(fileName)for line in fr.readlines():curLine = line.strip().split('\t')fltLine = map(float,curLine) #map all elements to float()dataMat.append(fltLine)return dataMatdef distEclud(vecA, vecB):return sqrt(sum(power(vecA - vecB, 2))) #la.norm(vecA-vecB)def randCent(dataSet, k):n = shape(dataSet)[1]centroids = mat(zeros((k,n)))#create centroid matfor j in range(n):#create random cluster centers, within bounds of each dimensionminJ = min(array(dataSet)[:,j])rangeJ = float(max(array(dataSet)[:,j]) - minJ)centroids[:,j] = mat(minJ + rangeJ * random.rand(k,1))return centroidsdef kMeans(dataSet, k, distMeas=distEclud, createCent=randCent):m = shape(dataSet)[0]clusterAssment = mat(zeros((m,2)))#create mat to assign data points #to a centroid, also holds SE of each pointcentroids = createCent(dataSet, k)clusterChanged = Truewhile clusterChanged:clusterChanged = Falsefor i in range(m):#for each data point assign it to the closest centroidminDist = inf; minIndex = -1for j in range(k):distJI = distMeas(array(centroids)[j,:],array(dataSet)[i,:])if distJI < minDist:minDist = distJI; minIndex = jif clusterAssment[i,0] != minIndex: clusterChanged = TrueclusterAssment[i,:] = minIndex,minDist**2print centroids # print nonzero(array(clusterAssment)[:,0]for cent in range(k):#recalculate centroidsptsInClust = dataSet[nonzero(array(clusterAssment)[:,0]==cent)[0][0]]#get all the point in this clustercentroids[cent,:] = mean(ptsInClust, axis=0) #assign centroid to mean id=nonzero(array(clusterAssment)[:,0]==cent)[0] return centroids, clusterAssment,iddef plotBestFit(dataSet,id,centroids): dataArr = array(dataSet)cent=array(centroids)n = shape(dataArr)[0] n1=shape(cent)[0]xcord1 = []; ycord1 = []xcord2 = []; ycord2 = []xcord3=[];ycord3=[]j=0for i in range(n):if j in id:xcord1.append(dataArr[i,0]); ycord1.append(dataArr[i,1])else:xcord2.append(dataArr[i,0]); ycord2.append(dataArr[i,1])j=j+1 for k in range(n1):xcord3.append(cent[k,0]);ycord3.append(cent[k,1]) fig = plt.figure()ax = fig.add_subplot(111)ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')ax.scatter(xcord2, ycord2, s=30, c='green')ax.scatter(xcord3, ycord3, s=50, c='black')plt.xlabel('X1'); plt.ylabel('X2');plt.show() if __name__=='__main__':dataSet=loadDataSet('/Users/hakuri/Desktop/testSet.txt') # # print randCent(dataSet,2) # print dataSet # # print kMeans(dataSet,2)a=[]b=[]a, b,id=kMeans(dataSet,2)plotBestFit(dataSet,id,a)用的時(shí)候直接看最后的main,dataSet是數(shù)據(jù)集輸入,我會(huì)在下載地址提供給大家。 kmeans函數(shù)第一個(gè)參數(shù)是輸入矩陣、第二個(gè)是K的值,也就是分幾份。 plotBestFit是畫圖函數(shù),需要加plot庫,而且目前只支持二維且K=2的情況。
3.效果圖
? ? ? 里面黑色的大點(diǎn)是兩個(gè)質(zhì)心,怎么樣,效果還可以吧!測(cè)試的時(shí)候一定要多用一點(diǎn)數(shù)據(jù)才會(huì)明顯。4.下載地址
? ? ?我的github地址https://github.com/jimenbian,喜歡就點(diǎn)個(gè)starO(∩_∩)O哈! ? ? ?點(diǎn)我下載
/********************************
* 本文來自博客 ?“李博Garvin“
* 轉(zhuǎn)載請(qǐng)標(biāo)明出處:http://blog.csdn.net/buptgshengod
******************************************/
總結(jié)
以上是生活随笔為你收集整理的【机器学习算法-python实现】K-means无监督学习实现分类的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【机器学习算法-python实现】矩阵去
- 下一篇: 【机器学习算法-python实现】采样算