當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

一个简单的线性拟合问题，到底有多少种做法

發(fā)布時(shí)間：2024/10/8 编程问答 46 豆豆

生活随笔收集整理的這篇文章主要介紹了一个简单的线性拟合问题，到底有多少种做法小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一個(gè)簡(jiǎn)單的線性擬合問題，到底有多少種做法

相信大家都做過線性擬合問題吧，其實(shí)就是給很多點(diǎn)，來(lái)求線性方程的斜率和截距。早在高中數(shù)學(xué)就有這類問題，我記得很清楚，如果出現(xiàn)在試卷中，一般出現(xiàn)在解答題的第二題左右，高中中的做法就是最小二乘法，代入公式，求斜率和截距，說句好聽，就是送分題。

在科學(xué)計(jì)算中，也是采用ols(普通最小二乘法)進(jìn)行回歸分析。OLS 全稱ordinary least squares,是回歸分析(regression analysis)最根本的一個(gè)形式。

深度學(xué)習(xí)

這個(gè)就是三大深度學(xué)習(xí)框架的入門demo。

（1） Keras

Keras 的核心數(shù)據(jù)結(jié)構(gòu)是 model，一種組織網(wǎng)絡(luò)層的方式。最簡(jiǎn)單的模型是 Sequential 順序模型，它由多個(gè)網(wǎng)絡(luò)層線性堆疊。

import keras import numpy as np import matplotlib.pyplot as plt %matplotlib inline from pylab import mpl #正常顯示畫圖時(shí)出現(xiàn)的中文 mpl.rcParams['font.sans-serif']=['SimHei'] from keras import layersx = np.linspace(0,50,50) y = 3 * x + 7 + np.random.randn(50) *5 plt.scatter(x,y)

通過numpy隨機(jī)繪制了50個(gè)散點(diǎn)圖，如下圖所示

下面使用keras搭建最簡(jiǎn)單的順序Sequential模型

# 順序模型 model = keras.Sequential() # 輸入輸出是一維特征 model.add(layers.Dense(1,input_dim=1)) model.summary() # 編譯模型 # 使用adam優(yōu)化器，損失函數(shù)mse均方差，在compile加入精確度 model.compile(optimizer='adam',loss='mse',metrics=['acc']) # 訓(xùn)練模型 epochs 參數(shù)是把數(shù)據(jù)訓(xùn)練3000遍 history = model.fit(x, y, epochs=3000) plt.scatter(x, y, c='r') plt.plot(x, model.predict(x))

運(yùn)行代碼，結(jié)果如下圖所示

（2）Pytorch

我們創(chuàng)建一個(gè)由方程 $y = 2 x + 0.2$ 產(chǎn)生的數(shù)據(jù)集，并通過torch.rand()函數(shù)制造噪音

import torch as t from matplotlib import pyplot as plt from torch.autograd import Variable from torch import nn # 創(chuàng)建數(shù)據(jù)集 x = Variable(t.unsqueeze(t.linspace(-1, 1, 100), dim=1)) y = Variable(x * 2 + 0.2 + t.rand(x.size())) plt.scatter(x.data.numpy(),y.data.numpy()) plt.show()

制造出來(lái)的數(shù)據(jù)集，如下圖所示

下面我們就要開始定義我們的模型，這里定義的是一個(gè)輸入層和輸出層都只有一維的模型，并定義出損失函數(shù)MSE和優(yōu)化函數(shù)sgd，這里使用均方誤差作為損失函數(shù)。

class LinearRegression(t.nn.Module):def __init__(self):#繼承父類構(gòu)造函數(shù)super(LinearRegression, self).__init__() #輸入和輸出的維度都是1self.linear = t.nn.Linear(1, 1) def forward(self, x):out = self.linear(x)return outmodel = LinearRegression()#實(shí)例化對(duì)象 num_epochs = 1000#迭代次數(shù) learning_rate = 1e-2#學(xué)習(xí)率0.01 Loss = t.nn.MSELoss()#損失函數(shù) optimizer = t.optim.SGD(model.parameters(), lr=learning_rate)#優(yōu)化函數(shù)

遍歷每次epoch，計(jì)算出loss，反向傳播計(jì)算梯度，不斷的更新梯度，使用梯度下降進(jìn)行優(yōu)化。

for epoch in range(num_epochs):# 預(yù)測(cè)y_pred= model(x)# 計(jì)算lossloss = Loss(y_pred, y)#清空上一步參數(shù)值optimizer.zero_grad()#反向傳播loss.backward()#更新參數(shù)optimizer.step()if epoch % 200 == 0:print("[{}/{}] loss:{:.4f}".format(epoch+1, num_epochs, loss))plt.scatter(x.data.numpy(), y.data.numpy()) plt.plot(x.data.numpy(), y_pred.data.numpy(), 'r-',lw=5) plt.text(0.5, 0,'Loss=%.4f' % loss.data.item(), fontdict={'size': 20, 'color': 'red'}) plt.show() ####結(jié)果如下#### [1/1000] loss:1.7766 [201/1000] loss:0.1699 [401/1000] loss:0.0816 [601/1000] loss:0.0759 [801/1000] loss:0.0755

運(yùn)行代碼，結(jié)果如下圖所示

（3）tensorflow

下面，我們采用tensorflow進(jìn)行線性擬合

首先生成我們的數(shù)據(jù)，數(shù)據(jù)標(biāo)簽通過真實(shí)函數(shù)加上高斯噪聲得到。

然后為了進(jìn)行梯度計(jì)算,定義了變量w和偏移量b，初值都設(shè)為0.

import tensorflow as tf import numpy as np# 一些參數(shù) learning_rate = 0.01 # 學(xué)習(xí)率 training_steps = 1000 # 訓(xùn)練次數(shù) display_step = 50 # 訓(xùn)練50次輸出一次# 訓(xùn)練數(shù)據(jù) X = np.linspace(0, 1, 50).reshape((-1,1)) Y = 2* X+ 1 + np.random.normal(0,0.1, X.shape)n_samples = 50# 隨機(jī)初始化權(quán)重和偏置 W = tf.Variable(np.random.randn(), name="weight") b = tf.Variable(np.random.randn(), name="bias")# 線性回歸函數(shù) def linear_regression(x):return W*x + b# 損失函數(shù) def mean_square(y_pred, y_true):return tf.reduce_sum(tf.pow(y_pred-y_true, 2)) / (2 * n_samples)# 優(yōu)化器采用隨機(jī)梯度下降(SGD) optimizer = tf.optimizers.SGD(learning_rate)# 計(jì)算梯度，更新參數(shù) def run_optimization():# tf.GradientTape()梯度帶，可以查看每一次epoch的參數(shù)值with tf.GradientTape() as g:pred = linear_regression(X)loss = mean_square(pred, Y)# 計(jì)算梯度gradients = g.gradient(loss, [W, b])# 更新W，boptimizer.apply_gradients(zip(gradients, [W, b]))# 開始訓(xùn)練 for step in range(1, training_steps+1):run_optimization()if step % display_step == 0:pred = linear_regression(X)loss = mean_square(pred, Y)print("step: %i, loss: %f, W: %f, b: %f" % (step, loss, W.numpy(), b.numpy()))

import matplotlib.pyplot as plt plt.plot(X, Y, 'ro', label='Original data') plt.plot(X, np.array(W * X + b), label='Fitted line') plt.legend() plt.show()

上面就是深度學(xué)習(xí)的做法，你們會(huì)看到一列蒙蔽，下面我來(lái)演示機(jī)器學(xué)習(xí)做法

機(jī)器學(xué)習(xí)

import numpy as np import matplotlib.pyplot as plt %matplotlib inline x = np.linspace(0,30,50) y = x+ 2*np.random.rand(50) plt.figure(figsize=(10,8)) from sklearn.linear_model import LinearRegression #導(dǎo)入線性回歸 model = LinearRegression() #初始化模型 x1 = x.reshape(-1,1) # 將行變列得到x坐標(biāo) y1 = y.reshape(-1,1) # 將行變列得到y(tǒng)坐標(biāo) model.fit(x1,y1) #訓(xùn)練數(shù)據(jù) plt.scatter(x,y) x_test = np.linspace(0,40).reshape(-1,1) plt.plot(x_test,model.predict(x_test)) model.coef_ #array([[1.00116024]]) 斜率 model.intercept_ # array([0.86175551]) 截距

Numpy

import numpy as np x = np.linspace(0,30,50) y = x+ 1 + np.random.normal(0,0.1, 50) z1 = np.polyfit(x,y,1) #一次多項(xiàng)式擬合，相當(dāng)于線性擬合 z1 # [1.00895356, 0.71872268] p1 = np.poly1d(z1) p1 # array([1.00032794, 0.9799152 ])

Scipy

from scipy.stats import linregress x = np.linspace(0,30,50) y = x + 2 +np.random.normal(0) slope, intercept, r_value, p_value, std_err = linregress(x, y) print("slope: %f intercept: %f" % (slope, intercept)) # slope: 1.000000 intercept: 1.692957 print("R-squared: %f" % r_value**2) #R-squared: 1.000000 plt.figure(figsize=(10,8)) plt.plot(x, y, 'o', label='original data') plt.plot(x, intercept + slope*x, 'r', label='fitted line') plt.legend() plt.show()

Statsmodels

# 線性模型 import statsmodels.api as sm import numpy as np x = np.linspace(0,10,100) y = 3*x + np.random.randn()+ 10 # Fit and summarize OLS model X = sm.add_constant(x) mod = sm.OLS(y,X) result = mod.fit() print(result.params) print(result.summary()) [9.65615842 3. ]OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 1.000 Model: OLS Adj. R-squared: 1.000 Method: Least Squares F-statistic: 7.546e+31 Date: Thu, 25 Jul 2019 Prob (F-statistic): 0.00 Time: 21:10:18 Log-Likelihood: 3082.0 No. Observations: 100 AIC: -6160. Df Residuals: 98 BIC: -6155. Df Model: 1 Covariance Type: nonrobust ==============================================================================coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 9.6562 2e-15 4.83e+15 0.000 9.656 9.656 x1 3.0000 3.45e-16 8.69e+15 0.000 3.000 3.000 ============================================================================== Omnibus: 4.067 Durbin-Watson: 0.161 Prob(Omnibus): 0.131 Jarque-Bera (JB): 4.001 Skew: 0.446 Prob(JB): 0.135 Kurtosis: 2.593 Cond. No. 11.7 ==============================================================================

R

R語(yǔ)言通過lm()函數(shù)創(chuàng)建預(yù)測(cè)變量與響應(yīng)變量之間的關(guān)系線性回歸模型。下面是我的筆記關(guān)于體重和身高的回歸

> x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) > y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) > lm(y~x) Call: lm(formula = y ~ x)Coefficients: (Intercept) x -38.4551 0.6746 > summary(lm(y~x))Call: lm(formula = y ~ x)Residuals:Min 1Q Median 3Q Max -6.3002 -1.6629 0.0412 1.8944 3.9775 Coefficients:Estimate Std. Error t value Pr(>|t|) (Intercept) -38.45509 8.04901 -4.778 0.00139 ** x 0.67461 0.05191 12.997 1.16e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 3.253 on 8 degrees of freedom Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491 F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06

lm(y~x)返回的是截距和斜率，即身高和體重的線性回歸方程：y = 0.6746x -38.4551

SPSS

SPSS也可以做回歸分析。

在菜單欄中選擇【分析】，在下拉菜單中選擇【回歸】，右側(cè)彈出子菜單中選擇【散線性】，彈出線性回歸窗口，將“cholesterol“變量移至右側(cè)的【因變量】框中，”time”變量移至右側(cè)的【自變量】框選擇因變量和自變量，如下圖所示

tep1：在菜單欄中選擇【分析】，在下拉菜單中選擇【一般線性模型】中選擇【多變量】，彈出“多變量”窗口，將“身高”、“體重”選入右側(cè)【因變量】框中，將“班級(jí)”選入【固定因子】框中，如下圖所示

Step2：點(diǎn)擊【事后比較】在【多變量：實(shí)測(cè)平均值的事后比較】窗口中，將“class”選入右側(cè)【事后檢驗(yàn)】中，并選中【圖基】，點(diǎn)擊【繼續(xù)】，如下圖所示。

Stata

SPSS出來(lái)了，我會(huì)放過Stata?

. set obs 10 number of observations (_N) was 0, now 10 . gen x=_n . gen y= x+runiform() . list+---------------+| x y ||---------------|1. | 1 1.348872 |2. | 2 2.266886 |3. | 3 3.136646 |4. | 4 4.028557 |5. | 5 5.868933 ||---------------|6. | 6 6.350855 |7. | 7 7.071105 |8. | 8 8.323368 |9. | 9 9.555103 |10. | 10 10.87599 |+---------------+. reg y xSource | SS df MS Number of obs = 10 -------------+---------------------------------- F(1, 8) = 1107.40Model | 89.9664664 1 89.9664664 Prob > F = 0.0000Residual | .649928434 8 .081241054 R-squared = 0.9928 -------------+---------------------------------- Adj R-squared = 0.9919Total | 90.6163949 9 10.0684883 Root MSE = .28503------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+----------------------------------------------------------------x | 1.044271 .0313806 33.28 0.000 .9719076 1.116635_cons | .1391392 .1947113 0.71 0.495 -.3098658 .5881443 ------------------------------------------------------------------------------

在結(jié)果界面中， Coef為1.044271表示回歸斜率，_cons為.1391392表示回歸截距，R-squared和Adj R-squared分別為0.9928和0.9919，說明回歸方程擬合效果很好。

如果我們需要繪制對(duì)應(yīng)圖形，只需要使用命令avplots就可以繪制折線圖，如下圖所示。

Excel

其實(shí)，最簡(jiǎn)單的完成這個(gè)回歸分析的當(dāng)然是Excel

現(xiàn)在需要求得當(dāng)價(jià)格取得多少時(shí)，獲得最大的利潤(rùn)。只需要CTRL+A全選數(shù)據(jù)，選擇插入X-Y散點(diǎn)圖，繪制帶平滑線和數(shù)據(jù)標(biāo)記的散點(diǎn)圖。點(diǎn)擊坐標(biāo)軸，右鍵，選擇坐標(biāo)軸選項(xiàng)，設(shè)置邊界最小值和最大值。點(diǎn)擊散點(diǎn)圖，添加趨勢(shì)線。在趨勢(shì)線選項(xiàng)中，存在指數(shù)，線性，對(duì)數(shù)，多項(xiàng)式，乘冪和移動(dòng)平均的回歸選擇，如下圖所示。

同時(shí)點(diǎn)擊顯示公式和R平方值，不斷地?fù)Q回歸關(guān)系的選項(xiàng)，使R平方值靠近1，即擬合程度越高。這里選擇二階多項(xiàng)式。具體方程：y = -4.6716x2 + 286.75x - 2988.4，R2 = 0.9201，如下圖所示。

還有powerbi也是可以做回歸分析。

總結(jié)

一種簡(jiǎn)單的線性擬合竟然有十幾種做法，對(duì)此，你會(huì)選擇哪一種下，是我肯定選擇Excel，20秒搞定。

假如有個(gè)需求就是要線性擬合，有的數(shù)據(jù)分析專家馬上跑DL，花了一個(gè)下午搞定了，卻不知道我用Excel，SPSS，20秒，點(diǎn)一點(diǎn)搞定，而且還你的好。

工具，要選擇利器。我記得有個(gè)廣告說：如何不加班通過Python完成上百份Excel報(bào)表的合并，你還不學(xué)Python。

我只想說一句：你學(xué)過Excel嗎？

我們點(diǎn)擊數(shù)據(jù)標(biāo)簽下邊的【新建查詢】→【從文件】→【從文件夾】，如下圖：

然后進(jìn)入PowerQuery界面瞬間搞定，干嘛一定要寫Python代碼？

再比如常見的因子和主成分分析，excel是搞不定，然后趕緊用Python做因子和主成分分析，需要嗎？

SPSS，Stata就是因子分析的好東西

再比如，之前做時(shí)間序列，結(jié)果我去用keras回滾3個(gè)月，就開始寫代碼，結(jié)果卻不知道經(jīng)常使用的Stata 就可以做時(shí)間序列，突然覺得自己很蠢，白白浪費(fèi)我時(shí)間

在stata中，幾乎所有的機(jī)器模型都有。

當(dāng)你，打開jupyter notebook的時(shí)候，你可以想一想excel能不能處理，比如，換個(gè)數(shù)，那就直接在excel換唄，還裝什么導(dǎo)入pandas。

在做數(shù)據(jù)分析的時(shí)候，請(qǐng)不要敲代碼，因?yàn)楦静恍枰?#xff0c;excel，powerbi，SPSS，Stata絕對(duì)能搞定，但是敲代碼都是高手，其實(shí)我看就是簡(jiǎn)單想搞復(fù)雜的“高手”。

高手請(qǐng)不要噴我，我知道你很厲害！

總結(jié)

以上是生活随笔為你收集整理的一个简单的线性拟合问题，到底有多少种做法的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：哪里有奶茶培训班寻找奶茶培训班的好去处
下一篇：从简单的一元线性回归分析入门机器学习

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

一个简单的线性拟合问题，到底有多少种做法

深度學(xué)習(xí)

機(jī)器學(xué)習(xí)

Numpy

Scipy

Statsmodels

R

SPSS

Stata

Excel

總結(jié)

總結(jié)