當前位置：首頁 > 编程语言 > python >内容正文

python

python线性输出_python sklearn-02：线性回归简单例子1

發布時間：2024/4/19 python 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 python线性输出_python sklearn-02：线性回归简单例子1 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原文鏈接：https://muxuezi.github.io/posts/2-linear-regression.html

1.一元線性回歸：

#一元線性回歸：

預測披薩的價格：數據如下：

import matplotlib.pyplot as plt

from matplotlib.font_manager import FontProperties

#這個屬性設置是讓matplot畫圖時顯示中文的標簽

font = FontProperties(fname=r"C:\Windows\Fonts\msyh.ttc",size=15)

#定義畫圖函數

def runplt():

plt.figure()

plt.title('披薩價格與直徑數據',fontproperties=font)

plt.xlabel('直徑(英寸)',fontproperties=font)

plt.ylabel('價格(美元)',fontproperties=font)

plt.axis([0,25,0,25],fontproperties=font)

plt.grid(True)

return plt

#訓練集數據

X = [[6], [8], [10], [14], [18]]

y = [[7], [9], [13], [17.5], [18]]

#導入一元線性回歸函數:y = α + βx

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X,y) #訓練集數據放入模型中

print ('預測一張12英寸披薩價格：$%.2f' % model.predict([12]))

plt = runplt()

X2 = [[0],[10],[14],[25]]

y2 = model.predict(X2) #預測數據

plt.plot(X,y,'k.')

plt.plot(X2,y2,'g-')

#殘差預測值

yr = model.predict(X)

for idx,x in enumerate(X):

plt.plot([x,x], [y[idx], yr[idx]],'r-')

plt.show()

圖如下：

解一元線性方程：

#解一元線性回歸:這里用最小二乘法

#LinearRegression類的fit()方法學習下面的一元線性回歸模型：y = α + βx

#β =cov(x, y)/var(x) (協方差/方程) α = yˉ ? βxˉ

import numpy as np

var =np.var([6, 8, 10, 14, 18], ddof=1)

print ('方差為%.2f'%var)

cov = np.cov([6, 8, 10, 14, 18], [7, 9, 13, 17.5, 18])[0][1]

print ('協方差為%.2f'%cov)

b = cov/var

a = np.mean(y)-b*np.mean(X)

print ('方程為 y = %.2f x - %.2f'%(a,b))

#輸出結果

方差為23.20

協方差為22.65

方程為 y = 1.97 x - 0.98

評估這個模型的預測準確度,這里引入測試集：

#模型評估:R方也叫確定系數,表示模型對現實數據的擬合程度。一定是介于0~1間的數

#引入測試集

X_test = [[8], [9], [11], [16], [12]]

y_test = [[11], [8.5], [15], [18], [11]]

r2 = model.score(X_test,y_test)

print ('R^2 = %.2f'%r2)

#輸出結果

R^2 = 0.66

看起來確定系數不高，那我們來試下多元回歸，看看效果如何。

2.多元線性回歸：

現實中披薩價格的影響因素應該不止直徑一個,這里引入了輔料的因素，數據更新如下：

訓練集數據：

測試集數據：

#多元回歸

#y = α + β1x1 + β2x2 + ? + βnxn,寫成矩陣形式 Y = Xβ,其中，是訓練集的響應變量列向量,是模型

#參數列向量。稱為設計矩陣，是維訓練集的解釋變量矩陣。是訓練集樣本數量，是解釋變量個數

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

font = FontProperties(fname=r"C:\Windows\Fonts\msyh.ttc",size=15)

def runplt():

plt.figure()

plt.title('披薩價格與直徑數據',fontproperties=font)

plt.xlabel('直徑(英寸)',fontproperties=font)

plt.ylabel('價格(美元)',fontproperties=font)

plt.axis([0,25,0,25],fontproperties=font)

plt.grid(True)

return plt

#訓練集,一元線性回歸模型訓練

X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]

y = [[7], [9], [13], [17.5], [18]]

model = LinearRegression()

model.fit(X,y)

#測試集,及預測

X_test = [[8, 2], [9, 0], [11, 2], [16, 2], [12, 0]]

y_test = [[11], [8.5], [15], [18], [11]]

predictions = model.predict(X_test)

#確定性系數計算

print ('R^2 為 %.2f' %model.score(X_test, y_test))

#畫圖

plt.title('多元回歸實際值與預測值',fontproperties=font)

plt.plot(y_test,label='y_test')

plt.plot(predictions,label='predictions')

plt.legend()

#結果輸出：

R^2 為 0.77

圖如下：

看到確定性系數比只有直徑一個因素時高，擬合效果比較好。

3.多項式回歸：二次回歸：

從訓練集原始數據的散點圖來看：其實也有可能是一個曲線模型，這里試下二次回歸的效果

還是用只有直徑一個因素的訓練集數據,二次回歸(Quadratic Regression),y = α + β1x + β2x^2,我們有一個解釋變量，但是模型有三項，通過第三項(二次項)來實現曲線關系，PolynomialFeatures轉換器可以用來解決這個問題。

#多項式回歸

#二次回歸(Quadratic Regression),y = α + β1x + β2x2,我們有一個解釋變量，但是模型有三項，通過第三項(二次項)來實現曲線關系

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

font = FontProperties(fname=r"C:\Windows\Fonts\msyh.ttc",size=15)

def runplt():

plt.figure()

plt.title('披薩價格與直徑數據',fontproperties=font)

plt.xlabel('直徑(英寸)',fontproperties=font)

plt.ylabel('價格(美元)',fontproperties=font)

plt.axis([0,25,0,25],fontproperties=font)

plt.grid(True)

return plt

X_train = [[6], [8], [10], [14], [18]]

y_train = [[7], [9], [13], [17.5], [18]]

X_test = [[6], [8], [11], [16]]

y_test = [[8], [12], [15], [18]]

plt = runplt()

regressor = LinearRegression()

regressor.fit(X_train,y_train)

xx = np.linspace(0, 26, 100)

yy = regressor.predict(xx.reshape(xx.shape[0], 1))

plt.plot(X_train, y_train, 'k.')

plt.plot(xx, yy)

#構造第三項

quadratic_fearurizer = PolynomialFeatures(degree=2)

X_train_quadratic = quadratic_fearurizer.fit_transform(X_train)

X_test_quadratic = quadratic_fearurizer.transform(X_test)

regressor_quadratic = LinearRegression()

regressor_quadratic.fit(X_train_quadratic, y_train)

xx_quadratic = quadratic_fearurizer.transform(xx.reshape(xx.shape[0],1))

plt.plot(xx,regressor_quadratic.predict(xx_quadratic),'r-')

plt.show()

print ('一元線性回歸 r^2: %.2f'%regressor.score(X_test,y_test))

print('二次回歸 r^2: %.2f'%regressor_quadratic.score(X_test_quadratic, y_test))

結果如下：

從r^2 看起來二次回歸效果比線性回歸好。那么三次回歸，四次回歸效果會不會更好呢？我們來試下：

#多項式回歸

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

import matplotlib.pyplot as plt

X_train = [[6], [8], [10], [14], [18]]

y_train = [[7], [9], [13], [17.5], [18]]

X_test = [[6], [8], [11], [16]]

y_test = [[8], [12], [15], [18]]

k_range = range(2,10)

k_scores = []

regressor = LinearRegression()

regressor.fit(X_train,y_train)

k_scores.append (regressor.score(X_test,y_test))

for k in k_range:

k_featurizer = PolynomialFeatures(degree=k)

X_train_k = k_featurizer.fit_transform(X_train)

X_test_k = k_featurizer.transform(X_test)

regression_k = LinearRegression()

regression_k.fit(X_train_k,y_train)

k_scores.append(regression_k.score(X_test_k,y_test))

for i in range(0,8):

print('%d 項式 r^2 是 %.2f'%(i+1,k_scores[i]))

plt.plot([1,2,3,4,5,6,7,8,9],k_scores)

plt.show()

結果如下：

1項式就是線性回歸啦，從r^2的圖來看，并不是項式越多效果越好，在二項式時擬合效果最高。后面的那些情況較多過度擬合，這種模型并沒有從輸入和輸出中推導出一般的規律，而是記憶訓練集的結果，這樣

在測試集的測試效果就不好了。是有一些方式可以避免這種情況的。后面再慢慢學習。

總結

以上是生活随笔為你收集整理的python线性输出_python sklearn-02：线性回归简单例子1的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python判断队列是否为空_[pyth
下一篇：随机生成一个质数的python代码_使用