當前位置：首頁 > 编程语言 > python >内容正文

python

机器学习算法的Python实现 (1)：logistics回归与线性判别分析（LDA）

發布時間：2023/12/14 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习算法的Python实现 (1)：logistics回归与线性判别分析（LDA）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

先收藏。。。。。。。。。。。。

本文為筆者在學習周志華老師的機器學習教材后，寫的課后習題的的編程題。之前放在答案的博文中，現在重新進行整理，將需要實現代碼的部分單獨拿出來，慢慢積累。希望能寫一個機器學習算法實現的系列。

本文主要包括：

1、logistics回歸

2、python庫：

numpy
matplotlib
pandas

使用的數據集：機器學習教材上的西瓜數據集3.0α

Idx	density	ratio_sugar	label
1	0.697	0.46	1
2	0.774	0.376	1
3	0.634	0.264	1
4	0.608	0.318	1
5	0.556	0.215	1
6	0.403	0.237	1
7	0.481	0.149	1
8	0.437	0.211	1
9	0.666	0.091	0
10	0.243	0.0267	0
11	0.245	0.057	0
12	0.343	0.099	0
13	0.639	0.161	0
14	0.657	0.198	0
15	0.36	0.37	0
16	0.593	0.042	0
17	0.719	0.103	0

logistic回歸： 參考《機器學習實戰》的內容。本題分別寫了梯度上升方法以及隨機梯度上升方法。對書本上的程序做了一點點改動
# -*- coding: cp936 -*- from numpy import * import pandas as pd import matplotlib.pyplot as plt #讀入csv文件數據 df=pd.read_csv('watermelon_3a.csv') m,n=shape(dataMat) df['norm']=ones((m,1)) dataMat=array(df[['norm','density','ratio_sugar']].values[:,:]) labelMat=mat(df['label'].values[:]).transpose() #sigmoid函數 def sigmoid(inX): return 1.0/(1+exp(-inX)) #梯度上升算法 def gradAscent(dataMat,labelMat): m,n=shape(df.values) alpha=0.1 maxCycles=500 weights=array(ones((n,1))) for k in range(maxCycles): a=dot(dataMat,weights) h=sigmoid(a) error=(labelMat-h) weights=weights+alpha*dot(dataMat.transpose(),error) return weights #隨機梯度上升 def randomgradAscent(dataMat,label,numIter=50): m,n=shape(dataMat) weights=ones(n) for j in range(numIter): dataIndex=range(m) for i in range(m): alpha=40/(1.0+j+i)+0.2 randIndex_Index=int(random.uniform(0,len(dataIndex))) randIndex=dataIndex[randIndex_Index] h=sigmoid(sum(dot(dataMat[randIndex],weights))) error=(label[randIndex]-h) weights=weights+alpha*error[0,0]*(dataMat[randIndex].transpose()) del(dataIndex[randIndex_Index]) return weights #畫圖 def plotBestFit(weights): m=shape(dataMat)[0] xcord1=[] ycord1=[] xcord2=[] ycord2=[] for i in range(m): if labelMat[i]==1: xcord1.append(dataMat[i,1]) ycord1.append(dataMat[i,2]) else: xcord2.append(dataMat[i,1]) ycord2.append(dataMat[i,2]) plt.figure(1) ax=plt.subplot(111) ax.scatter(xcord1,ycord1,s=30,c='red',marker='s') ax.scatter(xcord2,ycord2,s=30,c='green') x=arange(0.2,0.8,0.1) y=array((-weights[0]-weights[1]*x)/weights[2]) print shape(x) print shape(y) plt.sca(ax) plt.plot(x,y) #ramdomgradAscent #plt.plot(x,y[0]) #gradAscent plt.xlabel('density') plt.ylabel('ratio_sugar') #plt.title('gradAscent logistic regression') plt.title('ramdom gradAscent logistic regression') plt.show() #weights=gradAscent(dataMat,labelMat) weights=randomgradAscent(dataMat,labelMat) plotBestFit(weights)
梯度上升法得到的結果如下：
隨機梯度上升法得到的結果如下：
可以看出，兩種方法的效果基本差不多。但是隨機梯度上升方法所需要的迭代次數要少很多

LDA的編程主要參考書上P62的3.39 以及P61的3.33這兩個式子。由于用公式可以直接算出，因此比較簡單
公式如下：

代碼如下： # -*- coding: cp936 -*- from numpy import * import numpy as np import pandas as pd import matplotlib.pyplot as plt df=pd.read_csv('watermelon_3a.csv') def calulate_w(): df1=df[df.label==1] df2=df[df.label==0] X1=df1.values[:,1:3] X0=df2.values[:,1:3] mean1=array([mean(X1[:,0]),mean(X1[:,1])]) mean0=array([mean(X0[:,0]),mean(X0[:,1])]) m1=shape(X1)[0] sw=zeros(shape=(2,2)) for i in range(m1): xsmean=mat(X1[i,:]-mean1) sw+=xsmean.transpose()*xsmean m0=shape(X0)[0] for i in range(m0): xsmean=mat(X0[i,:]-mean0) sw+=xsmean.transpose()*xsmean w=(mean0-mean1)*(mat(sw).I) return w def plot(w): dataMat=array(df[['density','ratio_sugar']].values[:,:]) labelMat=mat(df['label'].values[:]).transpose() m=shape(dataMat)[0] xcord1=[] ycord1=[] xcord2=[] ycord2=[] for i in range(m): if labelMat[i]==1: xcord1.append(dataMat[i,0]) ycord1.append(dataMat[i,1]) else: xcord2.append(dataMat[i,0]) ycord2.append(dataMat[i,1]) plt.figure(1) ax=plt.subplot(111) ax.scatter(xcord1,ycord1,s=30,c='red',marker='s') ax.scatter(xcord2,ycord2,s=30,c='green') x=arange(-0.2,0.8,0.1) y=array((-w[0,0]*x)/w[0,1]) print shape(x) print shape(y) plt.sca(ax) #plt.plot(x,y) #ramdomgradAscent plt.plot(x,y) #gradAscent plt.xlabel('density') plt.ylabel('ratio_sugar') plt.title('LDA') plt.show() w=calulate_w() plot(w)
結果如下：

對應的w值為：

[ -6.62487509e-04, ?-9.36728168e-01]

由于數據分布的關系，所以LDA的效果不太明顯。所以我改了幾個label=0的樣例的數值，重新運行程序得到結果如下：

效果比較明顯，對應的w值為：

[-0.60311161, -0.67601433]

轉自：http://cache.baiducontent.com/c?m=9d78d513d9d430db4f9be0697b14c0101f4381132ba6d70209d6843890732f43506793ac57270772d7d20d1016db4d4bea81743971597deb8f8fc814d2e1d46e6d9f26476d01d61f4f860eafbc1764977c875a9ef34ea1a7b57accef8c959a49008a155e2bdea7960c57529934ae552ce4a59b49105a10bd&p=ce6fc64ad4d807f449bd9b7d0d1796&newp=c26ada15d9c041ae17a6c7710f0a88231610db2151dcd101298ffe0cc4241a1a1a3aecbf21261b01d4c67a6606a94c5de1f53373310434f1f689df08d2ecce7e60c3&user=baidu&fm=sc&query=%CF%DF%D0%D4%C5%D0%B1%F0%B7%D6%CE%F6+python&qid=ccbe92e80000a2cb&p1=1

轉載于:https://www.cnblogs.com/wyuzl/p/7654433.html

總結

以上是生活随笔為你收集整理的机器学习算法的Python实现 (1)：logistics回归与线性判别分析（LDA）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python控制安捷伦频谱仪_通过 py
下一篇： python动态数据类型_[python

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

机器学习算法的Python实现 (1)：logistics回归 与 线性判别分析（LDA）

總結

机器学习算法的Python实现 (1)：logistics回归与线性判别分析（LDA）