日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪(fǎng)問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

机器学习sklearn19.0聚类算法——Kmeans算法

發(fā)布時(shí)間:2025/3/15 编程问答 24 豆豆
生活随笔 收集整理的這篇文章主要介紹了 机器学习sklearn19.0聚类算法——Kmeans算法 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、關(guān)于聚類(lèi)及相似度、距離的知識(shí)點(diǎn)

?

二、k-means算法思想與流程

三、sklearn中對(duì)于kmeans算法的參數(shù)

四、代碼示例以及應(yīng)用的知識(shí)點(diǎn)簡(jiǎn)介

(1)make_blobs:聚類(lèi)數(shù)據(jù)生成器

?

sklearn.datasets.make_blobs(n_samples=100, n_features=2,centers=3, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None)[source]


?

返回值為:

(2)np.vstack方法作用——堆疊數(shù)組

詳細(xì)介紹參照博客鏈接:http://blog.csdn.net/csdn15698845876/article/details/73380803

?

?

[python]?view plaincopy
  • #!/usr/bin/env?python??
  • #?-*-?coding:utf-8?-*-??
  • #?Author:ZhengzhengLiu??
  • ??
  • #k-means聚類(lèi)算法??
  • ??
  • import?numpy?as?np??
  • import?pandas?as?pd??
  • import?matplotlib?as?mpl??
  • import?matplotlib.pyplot?as?plt??
  • import?matplotlib.colors??
  • import?sklearn.datasets?as?ds??
  • from?sklearn.cluster?import?KMeans??????#引入kmeans??
  • ??
  • #解決中文顯示問(wèn)題??
  • mpl.rcParams['font.sans-serif']?=?[u'SimHei']??
  • mpl.rcParams['axes.unicode_minus']?=?False??
  • ??
  • #產(chǎn)生模擬數(shù)據(jù)??
  • N?=?1500??
  • centers?=?4??
  • #make_blobs:聚類(lèi)數(shù)據(jù)生成器??
  • data,y?=?ds.make_blobs(N,n_features=2,centers=centers,random_state=28)??
  • ??
  • data2,y2?=?ds.make_blobs(N,n_features=2,centers=centers,random_state=28)??
  • data3?=?np.vstack((data[y==0][:200],data[y==1][:100],data[y==2][:10],data[y==3][:50]))??
  • y3?=?np.array([0]*200+[1]*100+[2]*10+[3]*50)??
  • ??
  • #模型的構(gòu)建??
  • km?=?KMeans(n_clusters=centers,random_state=28)??
  • km.fit(data,y)??
  • y_hat?=?km.predict(data)??
  • print("所有樣本距離聚簇中心點(diǎn)的總距離和:",km.inertia_)??
  • print("距離聚簇中心點(diǎn)的平均距離:",(km.inertia_/N))??
  • print("聚簇中心點(diǎn):",km.cluster_centers_)??
  • ??
  • y_hat2?=?km.fit_predict(data2)??
  • y_hat3?=?km.fit_predict(data3)??
  • ??
  • def?expandBorder(a,?b):??
  • ????d?=?(b?-?a)?*?0.1??
  • ????return?a-d,?b+d??
  • ??
  • #畫(huà)圖??
  • cm?=?mpl.colors.ListedColormap(list("rgbmyc"))??
  • plt.figure(figsize=(15,9),facecolor="w")??
  • plt.subplot(241)??
  • plt.scatter(data[:,0],data[:,1],c=y,s=30,cmap=cm,edgecolors="none")??
  • ??
  • x1_min,x2_min?=?np.min(data,axis=0)??
  • x1_max,x2_max?=?np.max(data,axis=0)??
  • x1_min,x1_max?=?expandBorder(x1_min,x1_max)??
  • x2_min,x2_max?=?expandBorder(x2_min,x2_max)??
  • plt.xlim((x1_min,x1_max))??
  • plt.ylim((x2_min,x2_max))??
  • plt.title("原始數(shù)據(jù)")??
  • plt.grid(True)??
  • ??
  • plt.subplot(242)??
  • plt.scatter(data[:,?0],?data[:,?1],?c=y_hat,?s=30,?cmap=cm,?edgecolors='none')??
  • plt.xlim((x1_min,?x1_max))??
  • plt.ylim((x2_min,?x2_max))??
  • plt.title(u'K-Means算法聚類(lèi)結(jié)果')??
  • plt.grid(True)??
  • ??
  • m?=?np.array(((1,?1),?(0.5,?5)))??
  • data_r?=?data.dot(m)??
  • y_r_hat?=?km.fit_predict(data_r)??
  • plt.subplot(243)??
  • plt.scatter(data_r[:,?0],?data_r[:,?1],?c=y,?s=30,?cmap=cm,?edgecolors='none')??
  • ??
  • x1_min,?x2_min?=?np.min(data_r,?axis=0)??
  • x1_max,?x2_max?=?np.max(data_r,?axis=0)??
  • x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??
  • x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??
  • ??
  • plt.xlim((x1_min,?x1_max))??
  • plt.ylim((x2_min,?x2_max))??
  • plt.title(u'數(shù)據(jù)旋轉(zhuǎn)后原始數(shù)據(jù)圖')??
  • plt.grid(True)??
  • ??
  • plt.subplot(244)??
  • plt.scatter(data_r[:,?0],?data_r[:,?1],?c=y_r_hat,?s=30,?cmap=cm,?edgecolors='none')??
  • plt.xlim((x1_min,?x1_max))??
  • plt.ylim((x2_min,?x2_max))??
  • plt.title(u'數(shù)據(jù)旋轉(zhuǎn)后預(yù)測(cè)圖')??
  • plt.grid(True)??
  • ??
  • plt.subplot(245)??
  • plt.scatter(data2[:,?0],?data2[:,?1],?c=y2,?s=30,?cmap=cm,?edgecolors='none')??
  • x1_min,?x2_min?=?np.min(data2,?axis=0)??
  • x1_max,?x2_max?=?np.max(data2,?axis=0)??
  • x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??
  • x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??
  • plt.xlim((x1_min,?x1_max))??
  • plt.ylim((x2_min,?x2_max))??
  • plt.title(u'不同方差的原始數(shù)據(jù)')??
  • plt.grid(True)??
  • ??
  • plt.subplot(246)??
  • plt.scatter(data2[:,?0],?data2[:,?1],?c=y_hat2,?s=30,?cmap=cm,?edgecolors='none')??
  • plt.xlim((x1_min,?x1_max))??
  • plt.ylim((x2_min,?x2_max))??
  • plt.title(u'不同方差簇?cái)?shù)據(jù)的K-Means算法聚類(lèi)結(jié)果')??
  • plt.grid(True)??
  • ??
  • plt.subplot(247)??
  • plt.scatter(data3[:,?0],?data3[:,?1],?c=y3,?s=30,?cmap=cm,?edgecolors='none')??
  • x1_min,?x2_min?=?np.min(data3,?axis=0)??
  • x1_max,?x2_max?=?np.max(data3,?axis=0)??
  • x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??
  • x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??
  • plt.xlim((x1_min,?x1_max))??
  • plt.ylim((x2_min,?x2_max))??
  • plt.title(u'不同簇樣本數(shù)量原始數(shù)據(jù)圖')??
  • plt.grid(True)??
  • ??
  • plt.subplot(248)??
  • plt.scatter(data3[:,?0],?data3[:,?1],?c=y_hat3,?s=30,?cmap=cm,?edgecolors='none')??
  • plt.xlim((x1_min,?x1_max))??
  • plt.ylim((x2_min,?x2_max))??
  • plt.title(u'不同簇樣本數(shù)量的K-Means算法聚類(lèi)結(jié)果')??
  • plt.grid(True)??
  • ??
  • plt.tight_layout(2,?rect=(0,?0,?1,?0.97))??
  • plt.suptitle(u'數(shù)據(jù)分布對(duì)KMeans聚類(lèi)的影響',?fontsize=18)??
  • plt.savefig("k-means聚類(lèi)算法.png")??
  • plt.show()??
  • ??
  • #運(yùn)行結(jié)果:??
  • 所有樣本距離聚簇中心點(diǎn)的總距離和:?2592.9990199??
  • 距離聚簇中心點(diǎn)的平均距離:?1.72866601327??
  • 聚簇中心點(diǎn):?[[?-7.44342199e+00??-2.00152176e+00]??
  • ?[??5.80338598e+00???2.75272962e-03]??
  • ?[?-6.36176159e+00???6.94997331e+00]??
  • ?[??4.34372837e+00???1.33977807e+00]]??

  • ?

    代碼中用到的知識(shí)點(diǎn):

    ?

    ?

    [python]?view plaincopy
  • #!/usr/bin/env?python??
  • #?-*-?coding:utf-8?-*-??
  • #?Author:ZhengzhengLiu??
  • ??
  • #kmean與mini?batch?kmeans?算法的比較??
  • ??
  • import?time??
  • import?numpy?as?np??
  • import?matplotlib?as?mpl??
  • import?matplotlib.pyplot?as?plt??
  • import?matplotlib.colors??
  • from?sklearn.cluster?import?KMeans,MiniBatchKMeans??
  • from?sklearn.datasets.samples_generator?import?make_blobs??
  • from?sklearn.metrics.pairwise?import?pairwise_distances_argmin??
  • ??
  • #解決中文顯示問(wèn)題??
  • mpl.rcParams['font.sans-serif']?=?[u'SimHei']??
  • mpl.rcParams['axes.unicode_minus']?=?False??
  • ??
  • #初始化三個(gè)中心??
  • centers?=?[[1,1],[-1,-1],[1,-1]]??
  • clusters?=?len(centers)?????#聚類(lèi)數(shù)目為3??
  • #產(chǎn)生3000組二維數(shù)據(jù)樣本,三個(gè)中心點(diǎn),標(biāo)準(zhǔn)差是0.7??
  • X,Y?=?make_blobs(n_samples=300,centers=centers,cluster_std=0.7,random_state=28)??
  • ??
  • #構(gòu)建kmeans算法??
  • k_means?=??KMeans(init="k-means++",n_clusters=clusters,random_state=28)??
  • t0?=?time.time()??
  • k_means.fit(X)??????#模型訓(xùn)練??
  • km_batch?=?time.time()-t0???????#使用kmeans訓(xùn)練數(shù)據(jù)消耗的時(shí)間??
  • print("K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%km_batch)??
  • ??
  • #構(gòu)建mini?batch?kmeans算法??
  • batch_size?=?100????????#采樣集的大小??
  • mbk?=?MiniBatchKMeans(init="k-means++",n_clusters=clusters,batch_size=batch_size,random_state=28)??
  • t0?=?time.time()??
  • mbk.fit(X)??
  • mbk_batch?=?time.time()-t0??
  • print("Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%mbk_batch)??
  • ??
  • #預(yù)測(cè)結(jié)果??
  • km_y_hat?=?k_means.predict(X)??
  • mbk_y_hat?=?mbk.predict(X)??
  • ??
  • #獲取聚類(lèi)中心點(diǎn)并對(duì)其排序??
  • k_means_cluster_center?=?k_means.cluster_centers_??
  • mbk_cluster_center?=?mbk.cluster_centers_??
  • print("K-Means算法聚類(lèi)中心點(diǎn):\n?center=",k_means_cluster_center)??
  • print("Mini?Batch?K-Means算法聚類(lèi)中心點(diǎn):\n?center=",mbk_cluster_center)??
  • order?=?pairwise_distances_argmin(k_means_cluster_center,mbk_cluster_center)??
  • ??
  • #畫(huà)圖??
  • plt.figure(figsize=(12,6),facecolor="w")??
  • plt.subplots_adjust(left=0.05,right=0.95,bottom=0.05,top=0.9)??
  • cm?=?mpl.colors.ListedColormap(['#FFC2CC',?'#C2FFCC',?'#CCC2FF'])??
  • cm2?=?mpl.colors.ListedColormap(['#FF0000',?'#00FF00',?'#0000FF'])??
  • ??
  • #子圖1——原始數(shù)據(jù)??
  • plt.subplot(221)??
  • plt.scatter(X[:,0],X[:,1],c=Y,s=6,cmap=cm,edgecolors="none")??
  • plt.title(u"原始數(shù)據(jù)分布圖")??
  • plt.xticks(())??
  • plt.yticks(())??
  • plt.grid(True)??
  • ??
  • #子圖2:K-Means算法聚類(lèi)結(jié)果圖??
  • plt.subplot(222)??
  • plt.scatter(X[:,0],?X[:,1],?c=km_y_hat,?s=6,?cmap=cm,edgecolors='none')??
  • plt.scatter(k_means_cluster_center[:,0],?k_means_cluster_center[:,1],c=range(clusters),s=60,cmap=cm2,edgecolors='none')??
  • plt.title(u'K-Means算法聚類(lèi)結(jié)果圖')??
  • plt.xticks(())??
  • plt.yticks(())??
  • plt.text(-3.8,?3,??'train?time:?%.2fms'?%?(km_batch*1000))??
  • plt.grid(True)??
  • ??
  • #子圖三Mini?Batch?K-Means算法聚類(lèi)結(jié)果圖??
  • plt.subplot(223)??
  • plt.scatter(X[:,0],?X[:,1],?c=mbk_y_hat,?s=6,?cmap=cm,edgecolors='none')??
  • plt.scatter(mbk_cluster_center[:,0],?mbk_cluster_center[:,1],c=range(clusters),s=60,cmap=cm2,edgecolors='none')??
  • plt.title(u'Mini?Batch?K-Means算法聚類(lèi)結(jié)果圖')??
  • plt.xticks(())??
  • plt.yticks(())??
  • plt.text(-3.8,?3,??'train?time:?%.2fms'?%?(mbk_batch*1000))??
  • plt.grid(True)??
  • plt.savefig("kmean與mini?batch?kmeans?算法的比較.png")??
  • plt.show()??
  • ??
  • #運(yùn)行結(jié)果:??
  • K-Means算法模型訓(xùn)練消耗時(shí)間:0.2260s??
  • Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:0.0230s??
  • K-Means算法聚類(lèi)中心點(diǎn):??
  • ?center=?[[?0.96091862??1.13741775]??
  • ?[?1.1979318??-1.02783007]??
  • ?[-0.98673669?-1.09398768]]??
  • Mini?Batch?K-Means算法聚類(lèi)中心點(diǎn):??
  • ?center=?[[?1.34304199?-1.01641075]??
  • ?[?0.83760683??1.01229021]??
  • ?[-0.92702179?-1.08205992]]??

  • ?

    ?

    五、聚類(lèi)算法的衡量指標(biāo)

    ?

    [python]?view plaincopy
  • #!/usr/bin/env?python??
  • #?-*-?coding:utf-8?-*-??
  • #?Author:ZhengzhengLiu??
  • ??
  • #聚類(lèi)算法評(píng)估??
  • ??
  • import?time??
  • import?numpy?as?np??
  • import?matplotlib?as?mpl??
  • import?matplotlib.pyplot?as?plt??
  • import?matplotlib.colors??
  • from?sklearn.cluster?import?KMeans,MiniBatchKMeans??
  • from?sklearn?import?metrics??
  • from?sklearn.metrics.pairwise?import?pairwise_distances_argmin??
  • from?sklearn.datasets.samples_generator?import?make_blobs??
  • ??
  • #解決中文顯示問(wèn)題??
  • mpl.rcParams['font.sans-serif']?=?[u'SimHei']??
  • mpl.rcParams['axes.unicode_minus']?=?False??
  • ??
  • #初始化三個(gè)中心??
  • centers?=?[[1,1],[-1,-1],[1,-1]]??
  • clusters?=?len(centers)?????#聚類(lèi)數(shù)目為3??
  • #產(chǎn)生3000組二維數(shù)據(jù)樣本,三個(gè)中心點(diǎn),標(biāo)準(zhǔn)差是0.7??
  • X,Y?=?make_blobs(n_samples=300,centers=centers,cluster_std=0.7,random_state=28)??
  • ??
  • #構(gòu)建kmeans算法??
  • k_means?=??KMeans(init="k-means++",n_clusters=clusters,random_state=28)??
  • t0?=?time.time()??
  • k_means.fit(X)??????#模型訓(xùn)練??
  • km_batch?=?time.time()-t0???????#使用kmeans訓(xùn)練數(shù)據(jù)消耗的時(shí)間??
  • print("K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%km_batch)??
  • ??
  • #構(gòu)建mini?batch?kmeans算法??
  • batch_size?=?100????????#采樣集的大小??
  • mbk?=?MiniBatchKMeans(init="k-means++",n_clusters=clusters,batch_size=batch_size,random_state=28)??
  • t0?=?time.time()??
  • mbk.fit(X)??
  • mbk_batch?=?time.time()-t0??
  • print("Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%mbk_batch)??
  • ??
  • km_y_hat?=?k_means.labels_??
  • mbkm_y_hat?=?mbk.labels_??
  • ??
  • k_means_cluster_centers?=?k_means.cluster_centers_??
  • mbk_means_cluster_centers?=?mbk.cluster_centers_??
  • print?("K-Means算法聚類(lèi)中心點(diǎn):\ncenter=",?k_means_cluster_centers)??
  • print?("Mini?Batch?K-Means算法聚類(lèi)中心點(diǎn):\ncenter=",?mbk_means_cluster_centers)??
  • order?=?pairwise_distances_argmin(k_means_cluster_centers,??
  • ??????????????????????????????????mbk_means_cluster_centers)??
  • ??
  • #效果評(píng)估??
  • ###?效果評(píng)估??
  • score_funcs?=?[??
  • ????metrics.adjusted_rand_score,????#ARI(調(diào)整蘭德指數(shù))??
  • ????metrics.v_measure_score,????????#均一性與完整性的加權(quán)平均??
  • ????metrics.adjusted_mutual_info_score,?#AMI(調(diào)整互信息)??
  • ????metrics.mutual_info_score,??????#互信息??
  • ]??
  • ??
  • ##?2.?迭代對(duì)每個(gè)評(píng)估函數(shù)進(jìn)行評(píng)估操作??
  • for?score_func?in?score_funcs:??
  • ????t0?=?time.time()??
  • ????km_scores?=?score_func(Y,?km_y_hat)??
  • ????print("K-Means算法:%s評(píng)估函數(shù)計(jì)算結(jié)果值:%.5f;計(jì)算消耗時(shí)間:%0.3fs"?%?(score_func.__name__,?km_scores,?time.time()?-?t0))??
  • ??
  • ????t0?=?time.time()??
  • ????mbkm_scores?=?score_func(Y,?mbkm_y_hat)??
  • ????print("Mini?Batch?K-Means算法:%s評(píng)估函數(shù)計(jì)算結(jié)果值:%.5f;計(jì)算消耗時(shí)間:%0.3fs\n"?%?(score_func.__name__,?mbkm_scores,?time.time()?-?t0))??
  • ??
  • #運(yùn)行結(jié)果:??
  • K-Means算法模型訓(xùn)練消耗時(shí)間:0.6350s??
  • Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:0.0900s??
  • K-Means算法聚類(lèi)中心點(diǎn):??
  • center=?[[?0.96091862??1.13741775]??
  • ?[?1.1979318??-1.02783007]??
  • ?[-0.98673669?-1.09398768]]??
  • Mini?Batch?K-Means算法聚類(lèi)中心點(diǎn):??
  • center=?[[?1.34304199?-1.01641075]??
  • ?[?0.83760683??1.01229021]??
  • ?[-0.92702179?-1.08205992]]??
  • K-Means算法:adjusted_rand_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.72566;計(jì)算消耗時(shí)間:0.071s??
  • Mini?Batch?K-Means算法:adjusted_rand_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.69544;計(jì)算消耗時(shí)間:0.001s??
  • ??
  • K-Means算法:v_measure_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.67529;計(jì)算消耗時(shí)間:0.004s??
  • Mini?Batch?K-Means算法:v_measure_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.65055;計(jì)算消耗時(shí)間:0.004s??
  • ??
  • K-Means算法:adjusted_mutual_info_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.67263;計(jì)算消耗時(shí)間:0.006s??
  • Mini?Batch?K-Means算法:adjusted_mutual_info_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.64731;計(jì)算消耗時(shí)間:0.005s??
  • ??
  • K-Means算法:mutual_info_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.74116;計(jì)算消耗時(shí)間:0.002s??
  • Mini?Batch?K-Means算法:mutual_info_score評(píng)估函數(shù)計(jì)算結(jié)果值:0.71351;計(jì)算消耗時(shí)間:0.001s??
  • 轉(zhuǎn)載于:https://www.cnblogs.com/mfryf/p/9007524.html

    總結(jié)

    以上是生活随笔為你收集整理的机器学习sklearn19.0聚类算法——Kmeans算法的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。