日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

sklearn聚类算法之HAC

發布時間:2024/3/12 编程问答 54 豆豆
生活随笔 收集整理的這篇文章主要介紹了 sklearn聚类算法之HAC 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

基本思想
層次凝聚聚類算法(Hierarchical Agglomerative Clustering)是一種效果很好的聚類算法,簡稱HAC,它的主要思想是先把每個樣本點當做一個聚類,然后不斷地將其中最近的兩個聚類進行合并,直到滿足某個迭代終止條件,比如當前聚類數是初始聚類數的20%,80%的聚類數都被合并了。總結來說,HAC的具體實現步驟如下所示。
????(1)將訓練樣本集中的每個數據點都當做一個聚類;
????(2)計算每兩個聚類之間的距離,將距離最近的或最相似的兩個聚類進行合并;
????(3)重復上述步驟,直到滿足迭代終止條件
在這個算法中,相似度的度量方式有如下四種方式:
????(1)Single-link:兩個不同聚類中離得最近的兩個點之間的距離,即MIN;
????(2)Complete-link:兩個不同聚類中離得最遠的兩個點之間的距離,即MAX;
????(3)Average-link:兩個不同聚類中所有點對距離的平均值,即AVERAGE;
????(4)Ward-link:兩個不同聚類聚在一起后離差平方和的增量
API學習

class sklearn.cluster.AgglomerativeClustering(n_clusters=2,*, affinity='euclidean', memory=None, connectivity=None,compute_full_tree='auto', linkage='ward', distance_threshold=None, compute_distances=False ) 參數類型解釋
n_clustersint or None, default=2表示聚類數,和distance_threshold中必須有一個是None
affinitystr or callable, default=‘euclidean’相似度度量函數,可以是’euclidean’/‘manhattan’/'cosine’等
memorystr or object with the joblib緩存計算過程的文件夾路徑
connectivityarray-like or callable, default=None可用來定義數據的給定結構,即對每個樣本給定鄰居樣本
compute_full_tree‘auto’ or bool, default=‘auto’如果為True,當聚類數較多時可用來減少計算時間
linkage{‘ward’, ‘complete’, ‘average’, ‘single’}, default=‘ward’表示不同的度量方法,默認為’ward’方法
distance_thresholdfloat, default=None如果不為None,表示簇不會聚合的距離閾值,此時n_clusters必須不為None,compute_full_tree必須為None
compute_distancesbool, default=False如果為True,即使不使用distance_threshold,也計算簇間距離,可用來可視化樹狀圖
屬性類型解釋
n_clusters_int聚類數
labels_ndarray of shape(n_samples)分類結果
n_leaves_int層次樹的樹葉數量
n_connected_components_int在圖中有聯系的部分的數量
n_features_in_int擬合期間的特征個數
feature_names_inndarray of shape(n_features_in_,)擬合期間的特征名稱
children_array-like of shape (n_samples-1, 2)每一個非葉子節點的孩子
distances_array-like of shape (n_nodes-1,)children_中各節點之間的距離
方法說明
fit(X[, y])Fit the hierarchical clustering from features, or distance matrix.
fit_predict(X[, y])Fit and return the result of each sample’s clustering assignment.
get_params([deep])Get parameters for this estimator.
set_params(**params)Set the parameters of this estimator.

代碼示例

>>> from sklearn.cluster import AgglomerativeClustering >>> import numpy as np >>> X = np.array([[1, 2], [1, 4], [1, 0], ... [4, 2], [4, 4], [4, 0]]) >>> clustering = AgglomerativeClustering().fit(X) >>> clustering AgglomerativeClustering() >>> clustering.labels_ array([1, 1, 1, 0, 0, 0])

優秀作品學習
test1.py

import numpy as npfrom matplotlib import pyplot as plt from scipy.cluster.hierarchy import dendrogram from sklearn.datasets import load_iris from sklearn.cluster import AgglomerativeClusteringdef plot_dendrogram(model, **kwargs):# Create linkage matrix and then plot the dendrogram# create the counts of samples under each nodecounts = np.zeros(model.children_.shape[0])n_samples = len(model.labels_)for i, merge in enumerate(model.children_):current_count = 0for child_idx in merge:if child_idx < n_samples:current_count += 1 # leaf nodeelse:current_count += counts[child_idx - n_samples]counts[i] = current_countlinkage_matrix = np.column_stack([model.children_, model.distances_, counts]).astype(float)# Plot the corresponding dendrogramdendrogram(linkage_matrix, **kwargs)iris = load_iris() X = iris.data# setting distance_threshold=0 ensures we compute the full tree. model = AgglomerativeClustering(distance_threshold=0, n_clusters=None)model = model.fit(X) plt.title("Hierarchical Clustering Dendrogram") # plot the top three levels of the dendrogram plot_dendrogram(model, truncate_mode="level", p=3) plt.xlabel("Number of points in node (or index of point if no parenthesis).") plt.show()

運行結果:

test2.py

import time as time import numpy as np import matplotlib.pyplot as plt import mpl_toolkits.mplot3d.axes3d as p3 from sklearn.cluster import AgglomerativeClustering from sklearn.datasets import make_swiss_roll# ############################################################################# # Generate data (swiss roll dataset) n_samples = 1500 noise = 0.05 X, _ = make_swiss_roll(n_samples, noise=noise) # Make it thinner X[:, 1] *= 0.5# ############################################################################# # Compute clustering print("Compute unstructured hierarchical clustering...") st = time.time() ward = AgglomerativeClustering(n_clusters=6, linkage="ward").fit(X) elapsed_time = time.time() - st label = ward.labels_ print("Elapsed time: %.2fs" % elapsed_time) print("Number of points: %i" % label.size)# ############################################################################# # Plot result fig = plt.figure() ax = p3.Axes3D(fig) ax.view_init(7, -80) for l in np.unique(label):ax.scatter(X[label == l, 0],X[label == l, 1],X[label == l, 2],color=plt.cm.jet(float(l) / np.max(label + 1)),s=20,edgecolor="k",) plt.title("Without connectivity constraints (time %.2fs)" % elapsed_time)# ############################################################################# # Define the structure A of the data. Here a 10 nearest neighbors from sklearn.neighbors import kneighbors_graphconnectivity = kneighbors_graph(X, n_neighbors=10, include_self=False)# ############################################################################# # Compute clustering print("Compute structured hierarchical clustering...") st = time.time() ward = AgglomerativeClustering(n_clusters=6, connectivity=connectivity, linkage="ward" ).fit(X) elapsed_time = time.time() - st label = ward.labels_ print("Elapsed time: %.2fs" % elapsed_time) print("Number of points: %i" % label.size)# ############################################################################# # Plot result fig = plt.figure() ax = p3.Axes3D(fig) ax.view_init(7, -80) for l in np.unique(label):ax.scatter(X[label == l, 0],X[label == l, 1],X[label == l, 2],color=plt.cm.jet(float(l) / np.max(label + 1)),s=20,edgecolor="k",) plt.title("With connectivity constraints (time %.2fs)" % elapsed_time)plt.show()

運行結果:

總結

以上是生活随笔為你收集整理的sklearn聚类算法之HAC的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。