日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

EM算法实践

發(fā)布時(shí)間:2025/1/21 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 EM算法实践 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

學(xué)習(xí)1
學(xué)習(xí)2

一、Basic EM算法

np.random.multivariate_normal(mean,convirance,size)生成多元正態(tài)分布()
  • 判斷預(yù)估的分布與實(shí)際分布的順序是否相同,需要用到樣本的標(biāo)簽及數(shù)據(jù)特征。
    程序的數(shù)據(jù)是男女身高,女生標(biāo)簽是0,男生是1。我們有先驗(yàn)知識(shí),男生的身高比女生高,所以這個(gè)通過(guò)比較模型的兩個(gè)均值,即可預(yù)測(cè)的那個(gè)分布是女生,哪個(gè)是男生。
    通過(guò)flag表示,女生是第一分布,flag=0.
  • cmp_point = mpl.colors.ListedColormap(['#22B14C','#ED1C24'])
  • 畫(huà)散點(diǎn)圖時(shí),通過(guò)c,cmap參數(shù)標(biāo)記不同類別的點(diǎn)
  • import numpy as np from scipy.stats import multivariate_normal from sklearn.mixture import GaussianMixture from sklearn.model_selection import train_test_split from mpl_toolkits.mplot3d import Axes3D import matplotlib as mpl from matplotlib import patches as mpatches import matplotlib.pyplot as plt from sklearn.metrics.pairwise import pairwise_distances_argmin import pandas as pdnp.random.seed(0) data = pd.read_csv('../HeightWeight.csv') print(data.head()) feature = data[['Height(cm)','Weight(kg)']] label = data['Sex'] # print(feature.shape) # print(label.shape) train_x,test_x,train_y,test_y = train_test_split(feature,label,test_size=0.3) print(train_x.shape) print(train_y.shape)# 建模 gmm = GaussianMixture(n_components=2,covariance_type='full',max_iter=100) gmm.fit(train_x) print('均值:\n',gmm.means_)mu1,mu2 = gmm.means_ cov1,cov2 = gmm.covariances_# 根據(jù)估計(jì)的參數(shù)值,建高斯分布 norm1 = multivariate_normal(mu1,cov1) norm2 = multivariate_normal(mu2,cov2) # 計(jì)算屬于不同分類時(shí)的概率 tau1 = norm1.pdf(train_x) tau2 = norm2.pdf(train_x) flag= 0# 判斷分布的女生在前還是男生在前 if gmm.means_[0][0]<gmm.means_[1][0]:# 這里使用了標(biāo)簽,女生的身高均值小于男性c1 = tau1 > tau2# 女生是第一個(gè)分布,標(biāo)簽為0 else:flag=1c1 = tau1 < tau2 #女生是第一個(gè)分布,標(biāo)簽為1 c2 = ~c1 # 預(yù)測(cè) tau1_test = norm1.pdf(test_x) tau2_test = norm2.pdf(test_x) if flag:c1_test = tau1_test < tau2_test else:c1_test = tau1_test > tau2_testc2_test = ~c1_testheight_min,height_max = data['Height(cm)'].min(),data['Height(cm)'].max() weight_min,weight_max = data['Weight(kg)'].min(),data['Weight(kg)'].max()x = np.linspace(height_min-0.5,height_max+0.5,300) y = np.linspace(weight_min-0.5,weight_max+0.5,300) xx,yy = np.meshgrid(x,y) # # print(xx) # height_min,height_max = 2,10 # weight_min,weight_max = 2,8 # # x = np.linspace(height_min-0.5,height_max+0.5,5) # y = np.linspace(weight_min-0.5,weight_max+0.5,2) # xx,yy = np.meshgrid(x,y) grid_test = np.stack((xx.flat,yy.flat),axis=1) grid_predict= gmm.predict(grid_test) # print(xx) # print(yy) # print(grid_test) # print(grid_predict) # print(train_x[c1])cmp_point = mpl.colors.ListedColormap(['#22B14C','#ED1C24']) cmp_bkg = mpl.colors.ListedColormap(['#B0E0E6','#FFC0CB'])plt.pcolormesh(xx,yy,grid_predict.reshape(xx.shape),cmap=cmp_bkg) plt.xlabel('Height(cm)') plt.ylabel('Weight(cm)')print(train_x.head()) print(train_x.columns) print('*'*20) print(train_x['Height(cm)'].shape) print(train_y.shape) print('*'*20) # plt.scatter(train_x[c1]['Height(cm)'],train_x[c1]['Weight(kg)'],c =train_x['Sex'] ,marker='o',edgecolors='r',cmap=cmp_point) plt.scatter(train_x['Height(cm)'],train_x['Weight(kg)'],c =train_y ,marker='o',cmap=cmp_point) # plt.scatter(train_x[c2]['Height(cm)'],train_x[c2]['Weight(kg)'],marker='o',edgecolors='b',cmap=cmp_point) #測(cè)試數(shù)據(jù)c =c1_test, plt.scatter(test_x['Height(cm)'],test_x['Weight(kg)'],c =c2_test ,marker='^',s = 60,cmap=cmp_point) # plt.scatter(test_x[c1_test]['Height(cm)'],test_x[c1_test]['Weight(kg)'],marker='^',edgecolors='r',cmap=cmp_point) # plt.scatter(test_x[c2_test]['Height(cm)'],test_x[c2_test]['Weight(kg)'],marker='^',edgecolors='b',cmap=cmp_point)patchs = [mpatches.Patch(color='#B0E0E6', label='girl'),mpatches.Patch(color='#FFC0CB', label='boy'),] plt.legend(handles=patchs, fancybox=True, framealpha=0.8)plt.show() train_acc = np.mean(train_y == c2) test_acc = np.mean(test_y == c2_test) print('trian acc: ',train_acc) print('test acc: ',test_acc)

    二、GMM參數(shù)

  • 方差類型
  • covariance_type= ('spherical', 'diag', 'tied', 'full')
  • BIC
    BIC=kln(n)?LBIC=kln(n) -LBIC=kln(n)?L
    其中,k為模型參數(shù)個(gè)數(shù),n為樣本數(shù)量,L為似然函數(shù)。kln(n)懲罰項(xiàng)在維數(shù)過(guò)大且訓(xùn)練樣本數(shù)據(jù)相對(duì)較少的情況下,可以有效避免出現(xiàn)維度災(zāi)難現(xiàn)象
  • 三、DPGMM

    DPGMM對(duì)于簇的個(gè)數(shù)選個(gè)比較有用

    dpgmm = BayesianGaussianMixture(n_components=n_components, covariance_type='full', max_iter=1000, n_init=5,weight_concentration_prior_type='dirichlet_process',weight_concentration_prior=0.1)

    總結(jié)

    以上是生活随笔為你收集整理的EM算法实践的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。