日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

十一、加权线性回归案例:预测鲍鱼的年龄

發(fā)布時(shí)間:2024/7/5 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 十一、加权线性回归案例:预测鲍鱼的年龄 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

加權(quán)線性回歸案例:預(yù)測鮑魚的年齡

點(diǎn)擊文章標(biāo)題即可獲取源代碼和筆記
數(shù)據(jù)集:https://download.csdn.net/download/weixin_44827418/12553408

1.導(dǎo)入數(shù)據(jù)集

數(shù)據(jù)集描述:

import pandas as pd import numpy as npabalone = pd.read_table("./datas/abalone.txt",header=None) abalone.columns=['性別','長度','直徑','高度','整體重量','肉重量','內(nèi)臟重量','殼重','年齡'] abalone.head() 性別長度直徑高度整體重量肉重量內(nèi)臟重量殼重年齡01234
10.4550.3650.0950.51400.22450.10100.15015
10.3500.2650.0900.22550.09950.04850.0707
-10.5300.4200.1350.67700.25650.14150.2109
10.4400.3650.1250.51600.21550.11400.15510
00.3300.2550.0800.20500.08950.03950.0557
abalone.shape (4177, 9) abalone.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 4177 entries, 0 to 4176 Data columns (total 9 columns):# Column Non-Null Count Dtype --- ------ -------------- ----- 0 性別 4177 non-null int64 1 長度 4177 non-null float642 直徑 4177 non-null float643 高度 4177 non-null float644 整體重量 4177 non-null float645 肉重量 4177 non-null float646 內(nèi)臟重量 4177 non-null float647 殼重 4177 non-null float648 年齡 4177 non-null int64 dtypes: float64(7), int64(2) memory usage: 293.8 KB abalone.describe() 性別長度直徑高度整體重量肉重量內(nèi)臟重量殼重年齡countmeanstdmin25%50%75%max
4177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.000000
0.0529090.5239920.4078810.1395160.8287420.3593670.1805940.2388319.933684
0.8222400.1200930.0992400.0418270.4903890.2219630.1096140.1392033.224169
-1.0000000.0750000.0550000.0000000.0020000.0010000.0005000.0015001.000000
-1.0000000.4500000.3500000.1150000.4415000.1860000.0935000.1300008.000000
0.0000000.5450000.4250000.1400000.7995000.3360000.1710000.2340009.000000
1.0000000.6150000.4800000.1650001.1530000.5020000.2530000.32900011.000000
1.0000000.8150000.6500001.1300002.8255001.4880000.7600001.00500029.000000

2. 查看數(shù)據(jù)分布狀況

import numpy as np import pandas as pd import random import matplotlib as mpl import matplotlib.pyplot as plt plt.rcParams['font.sans-serif']=['simhei'] #顯示中文 plt.rcParams['axes.unicode_minus']=False # 用來正常顯示負(fù)號(hào) %matplotlib inline mpl.cm.rainbow(np.linspace(0,1,10)) array([[5.00000000e-01, 0.00000000e+00, 1.00000000e+00, 1.00000000e+00],[2.80392157e-01, 3.38158275e-01, 9.85162233e-01, 1.00000000e+00],[6.07843137e-02, 6.36474236e-01, 9.41089253e-01, 1.00000000e+00],[1.66666667e-01, 8.66025404e-01, 8.66025404e-01, 1.00000000e+00],[3.86274510e-01, 9.84086337e-01, 7.67362681e-01, 1.00000000e+00],[6.13725490e-01, 9.84086337e-01, 6.41213315e-01, 1.00000000e+00],[8.33333333e-01, 8.66025404e-01, 5.00000000e-01, 1.00000000e+00],[1.00000000e+00, 6.36474236e-01, 3.38158275e-01, 1.00000000e+00],[1.00000000e+00, 3.38158275e-01, 1.71625679e-01, 1.00000000e+00],[1.00000000e+00, 1.22464680e-16, 6.12323400e-17, 1.00000000e+00]]) mpl.cm.rainbow(np.linspace(0,1,10))[0] array([0.5, 0. , 1. , 1. ]) def dataPlot(dataSet):m,n = dataSet.shapefig = plt.figure(figsize=(8,20),dpi=100)colormap = mpl.cm.rainbow(np.linspace(0,1,n))for i in range(n):fig_ = fig.add_subplot(n,1,i+1)plt.scatter(range(m),dataSet.iloc[:,i].values,s=2,c=colormap[i])plt.title(dataSet.columns[i])plt.tight_layout(pad=1.2) # 調(diào)節(jié)子圖間的距離 # 運(yùn)行函數(shù),查看數(shù)據(jù)分布: dataPlot(abalone) 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

可以從數(shù)據(jù)分布散點(diǎn)圖中看出:

1)除“性別”之外,其他數(shù)據(jù)明顯存在規(guī)律性排列

2)“高度”這一特征中,有兩個(gè)異常值

從看到的現(xiàn)象,我們可以采取以下兩種措施:

1) 切分訓(xùn)練集和測試集時(shí),需要打亂原始數(shù)據(jù)集來進(jìn)行隨機(jī)挑選

2) 剔除"高度"這一特征中的異常值

abalone['高度']<0.4 0 True 1 True 2 True 3 True 4 True... 4172 True 4173 True 4174 True 4175 True 4176 True Name: 高度, Length: 4177, dtype: bool aba = abalone.loc[abalone['高度']<0.4,:] #再次查看數(shù)據(jù)集的分布 dataPlot(aba) 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

2. 切分訓(xùn)練集和測試集

""" 函數(shù)功能:隨機(jī)切分訓(xùn)練集和測試集 參數(shù)說明:dataSet:原始數(shù)據(jù)集rate:訓(xùn)練集比例 返回:train,test:切分好的訓(xùn)練集和測試集 """ def randSplit(dataSet,rate):l = list(dataSet.index) # 將原始數(shù)據(jù)集的索引提取出來,存到列表中random.seed(123) # 設(shè)置隨機(jī)數(shù)種子random.shuffle(l) # 隨機(jī)打亂數(shù)據(jù)集中的索引dataSet.index = l # 把打亂后的索引重新賦值給數(shù)據(jù)集中的索引,# 索引打亂了就相當(dāng)于打亂了原始數(shù)據(jù)集中的數(shù)據(jù)m = dataSet.shape[0] # 原始數(shù)據(jù)集樣本總數(shù)n = int(m*rate) # 訓(xùn)練集樣本數(shù)量train = dataSet.loc[range(n),:] # 從打亂了的原始數(shù)據(jù)集中提取出訓(xùn)練集數(shù)據(jù)test = dataSet.loc[range(n,m),:] # 從打亂了的原始數(shù)據(jù)集中提取出測試集數(shù)據(jù)train.index = range(train.shape[0]) # 重置train訓(xùn)練數(shù)據(jù)集中的索引test.index = range(test.shape[0]) # 重置test測試數(shù)據(jù)集中的索引dataSet.index = range(dataSet.shape[0]) # 重置原始數(shù)據(jù)集中的索引return train,test train,test = randSplit(aba,0.8) #探索訓(xùn)練集 train.head() 性別長度直徑高度整體重量肉重量內(nèi)臟重量殼重年齡01234
-10.5900.4700.1700.90000.35500.19050.250011
10.5600.4500.1450.93550.42500.16450.272511
-10.6350.5350.1901.24200.57600.24750.390014
10.5050.3900.1150.55850.25750.11900.15358
10.5100.4100.1450.79600.38650.18150.19558
train.shape (3340, 9) abalone.describe() 性別長度直徑高度整體重量肉重量內(nèi)臟重量殼重年齡countmeanstdmin25%50%75%max
4177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.000000
0.0529090.5239920.4078810.1395160.8287420.3593670.1805940.2388319.933684
0.8222400.1200930.0992400.0418270.4903890.2219630.1096140.1392033.224169
-1.0000000.0750000.0550000.0000000.0020000.0010000.0005000.0015001.000000
-1.0000000.4500000.3500000.1150000.4415000.1860000.0935000.1300008.000000
0.0000000.5450000.4250000.1400000.7995000.3360000.1710000.2340009.000000
1.0000000.6150000.4800000.1650001.1530000.5020000.2530000.32900011.000000
1.0000000.8150000.6500001.1300002.8255001.4880000.7600001.00500029.000000
train.describe() #統(tǒng)計(jì)描述 性別長度直徑高度整體重量肉重量內(nèi)臟重量殼重年齡countmeanstdmin25%50%75%max
3340.0000003340.0000003340.0000003340.0000003340.0000003340.0000003340.0000003340.0000003340.000000
0.0604790.5227540.4068860.1387900.8249060.3581510.1797320.2371589.911976
0.8190210.1203000.0993720.0384410.4885350.2224220.1090360.1379203.223534
-1.0000000.0750000.0550000.0000000.0020000.0010000.0005000.0015001.000000
-1.0000000.4500000.3500000.1150000.4390000.1843750.0920000.1300008.000000
0.0000000.5400000.4200000.1400000.7967500.3355000.1710000.2320009.000000
1.0000000.6150000.4800000.1650001.1472500.4985000.2505000.32500011.000000
1.0000000.7800000.6300000.2500002.8255001.4880000.7600001.00500027.000000
dataPlot(train) #查看訓(xùn)練集數(shù)據(jù)分布 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

#探索測試集 test.head() 性別長度直徑高度整體重量肉重量內(nèi)臟重量殼重年齡01234
10.6300.4700.1501.13550.53900.23250.311512
-10.5850.4450.1400.91300.43050.22050.253010
-10.3900.2900.1250.30550.12100.08200.09007
10.5250.4100.1300.99000.38650.24300.295015
10.6250.4750.1601.08450.50050.23550.310510
test.shape (835, 9) test.describe() 性別長度直徑高度整體重量肉重量內(nèi)臟重量殼重年齡countmeanstdmin25%50%75%max
835.000000835.000000835.000000835.000000835.000000835.000000835.000000835.000000835.000000
0.0227540.5288080.4117370.1407840.8427140.3633700.1837490.24532010.022754
0.8343410.1191660.0986270.0386640.4959900.2189380.1115100.1439253.230284
-1.0000000.1300000.1000000.0150000.0130000.0045000.0030000.0040003.000000
-1.0000000.4500000.3500000.1150000.4580000.1920000.0965000.1327508.000000
0.0000000.5500000.4300000.1400000.8100000.3390000.1705000.23500010.000000
1.0000000.6200000.4850000.1700001.1772500.5107500.2592500.33700011.000000
1.0000000.8150000.6500000.2500002.5550001.1455000.5900000.81500029.000000
dataPlot(test) 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

3.構(gòu)建輔助函數(shù)

''' 函數(shù)功能:輸入DF數(shù)據(jù)集(最后一列為標(biāo)簽),返回特征矩陣和標(biāo)簽矩陣 ''' def get_Mat(dataSet):xMat = np.mat(dataSet.iloc[:,:-1].values)yMat = np.mat(dataSet.iloc[:,-1].values).Treturn xMat,yMat ''' 函數(shù)功能:數(shù)據(jù)集可視化 ''' def plotShow(dataSet):xMat,yMat = get_Mat(dataSet)plt.scatter(xMat.A[:,1],yMat.A,c='b',s=5)plt.show() ''' 函數(shù)功能:計(jì)算回歸系數(shù) 參數(shù)說明:dataSet:原始數(shù)據(jù)集 返回:ws:回歸系數(shù) ''' def standRegres(dataSet):xMat,yMat = get_Mat(dataSet)xTx = xMat.T * xMatif np.linalg.det(xTx) == 0:print('矩陣為奇異矩陣,無法求逆!')returnws = xTx.I*(xMat.T*yMat) # xTx.I ,用來求逆矩陣return ws """ 函數(shù)功能:計(jì)算誤差平方和SSE 參數(shù)說明:dataSet:真實(shí)值regres:求回歸系數(shù)的函數(shù) 返回:SSE:誤差平方和 """ def sseCal(dataSet, regres):xMat,yMat = get_Mat(dataSet)ws = regres(dataSet)yHat = xMat*wssse = ((yMat.A.flatten() - yHat.A.flatten())**2).sum()# return sse

以ex0數(shù)據(jù)集為例,查看函數(shù)運(yùn)行結(jié)果:

ex0 = pd.read_table("./datas/ex0.txt",header=None) ex0.head() 01201234
1.00.0677323.176513
1.00.4278103.816464
1.00.9957314.550095
1.00.7383364.256571
1.00.9810834.560815
#簡單線性回歸的SSE sseCal(ex0, standRegres) 1.3552490816814902

構(gòu)建相關(guān)系數(shù)R2計(jì)算函數(shù)

""" 函數(shù)功能:計(jì)算相關(guān)系數(shù)R2 """ def rSquare(dataSet,regres):xMat,yMat=get_Mat(dataSet)sse = sseCal(dataSet,regres)sst = ((yMat.A-yMat.mean())**2).sum()# r2 = 1 - sse / sstreturn r2

同樣以ex0數(shù)據(jù)集為例,查看函數(shù)運(yùn)行結(jié)果:

#簡單線性回歸的R2 rSquare(ex0, standRegres) 0.9731300889856916 ''' 函數(shù)功能:計(jì)算局部加權(quán)線性回歸的預(yù)測值 參數(shù)說明:testMat:測試集xMat:訓(xùn)練集的特征矩陣yMat:訓(xùn)練集的標(biāo)簽矩陣返回:yHat:函數(shù)預(yù)測值 ''' def LWLR(testMat,xMat,yMat,k=1.0):n = testMat.shape[0] # 測試數(shù)據(jù)集行數(shù)m = xMat.shape[0] # 訓(xùn)練集特征矩陣行數(shù)weights = np.mat(np.eye(m)) # 用單位矩陣來初始化權(quán)重矩陣,yHat = np.zeros(n) # 用0矩陣來初始化預(yù)測值矩陣for i in range(n):for j in range(m):diffMat = testMat[i] - xMat[j]weights[j,j] = np.exp(diffMat*diffMat.T / (-2*k**2))xTx = xMat.T*(weights*xMat)if np.linalg.det(xTx) == 0:print('矩陣為奇異矩陣,無法求逆')returnws = xTx.I*(xMat.T*(weights*yMat))yHat[i] = testMat[i] * wsreturn ws,yHat

4.構(gòu)建加權(quán)線性模型

因?yàn)閿?shù)據(jù)量太大,計(jì)算速度極慢,所以此處選擇訓(xùn)練集的前100個(gè)數(shù)據(jù)作為訓(xùn)練集,測試集的前100個(gè)數(shù)據(jù)作為測試集。

""" 函數(shù)功能:繪制不同k取值下,訓(xùn)練集和測試集的SSE曲線 """ def ssePlot(train,test):X0,Y0 = get_Mat(train)X1,Y1 =get_Mat(test)train_sse = []test_sse = []for k in np.arange(0.2,10,0.5):ws1,yHat1 = LWLR(X0[:99],X0[:99],Y0[:99],k) sse1 = ((Y0[:99].A.T - yHat1)**2).sum() train_sse.append(sse1)ws2,yHat2 = LWLR(X1[:99],X0[:99],Y0[:99],k) sse2 = ((Y1[:99].A.T - yHat2)**2).sum() test_sse.append(sse2)plt.figure(figsize=(20,8),dpi=100)plt.plot(np.arange(0.2,10,0.5),train_sse,color='b')# plt.plot(np.arange(0.2,10,0.5),test_sse,color='r') plt.xlabel('不同k取值')plt.ylabel('SSE')plt.legend(['train_sse','test_sse'])

運(yùn)行結(jié)果:

ssePlot(train,test)

這個(gè)圖的解讀應(yīng)該是這樣的:從右往左看,當(dāng)K取較大值時(shí),模型比較穩(wěn)定,隨著K值的減小,訓(xùn)練集的SSE開始逐漸減小,當(dāng)K取到2左右,訓(xùn)練集的SSE與測試集的SSE相等,當(dāng)K繼續(xù)減小時(shí),訓(xùn)練集的SSE也越來越小,也就是說,模型在訓(xùn)練集上的表現(xiàn)越來越好,但是,模型在測試集上的表現(xiàn)卻越來越差了,這就說明模型開始出現(xiàn)過擬合了。其實(shí),這個(gè)圖與前面不同k值的結(jié)果圖是吻合的,K=1.0,
0.01, 0.003這三張圖也表明隨著K的減小,模型會(huì)逐漸出現(xiàn)過擬合。所以這里可以看出,K在2左右的取值最佳。

我們再將K=2帶入局部線性回歸模型中,然后查看預(yù)測結(jié)果:

train,test = randSplit(aba,0.8) # 隨機(jī)切分原始數(shù)據(jù)集,得到訓(xùn)練集和測試集 trainX,trainY = get_Mat(train) # 將切分好的訓(xùn)練集分成特征矩陣和標(biāo)簽矩陣 testX,testY = get_Mat(test) # 將切分好的測試集分成特征矩陣和標(biāo)簽矩陣 ws0,yHat0 = LWLR(testX,trainX,trainY,k=2)

繪制真實(shí)值與預(yù)測值之間的關(guān)系圖

y=testY.A.flatten() plt.scatter(y,yHat0,c='b',s=5); # ;等效于plt.show()

通過上圖可知,橫坐標(biāo)為真實(shí)值,縱坐標(biāo)為預(yù)測值,形成的圖像為呈現(xiàn)一個(gè)“喇叭形”,隨著橫坐標(biāo)真實(shí)值逐漸變大,縱坐標(biāo)預(yù)測值也越來越大,說明隨著真實(shí)值的增加,預(yù)測值偏差越來越大

封裝一個(gè)函數(shù)來計(jì)算SSE和R方,方便后續(xù)調(diào)用

""" 函數(shù)功能:計(jì)算加權(quán)線性回歸的SSE和R方 """ def LWLR_pre(dataSet):train,test = randSplit(dataSet,0.8)# trainX,trainY = get_Mat(train)testX,testY = get_Mat(test)ws,yHat = LWLR(testX,trainX,trainY,k=2)# sse = ((testY.A.T - yHat)**2).sum()# sst = ((testY.A-testY.mean())**2).sum() # r2 = 1 - sse / sstreturn sse,r2

查看模型預(yù)測結(jié)果

LWLR_pre(aba) (4152.777097646255, 0.5228101340130846)

從結(jié)果可以看出,SSE達(dá)4000+,相關(guān)系數(shù)只有0.52,模型效果并不是很好。

總結(jié)

以上是生活随笔為你收集整理的十一、加权线性回归案例:预测鲍鱼的年龄的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 免费草逼网站 | 丰满肉肉bbwwbbww | 尤物视频官网 | 日韩大尺度视频 | av福利网站 | 一级视频在线观看 | 毛片手机在线 | 亚洲不卡一区二区三区 | 久久4| 精品福利在线视频 | 91亚洲国产成人久久精品麻豆 | 欧美激情在线看 | 国产综合久久久久 | 快色网站| 91蝌蚪少妇 | av久热 | 成人一区二区电影 | 亚洲国产果冻传媒av在线观看 | 色哟哟日韩精品 | 黄色xxx| 午夜三级福利 | 久久dvd| 91狠狠综合 | 午夜精品无码一区二区三区 | 偷操| 亚洲熟妇无码爱v在线观看 九色福利 | 致单身男女免费观看完整版 | 一级黄色在线播放 | www.日韩一区 | 99精品久久久 | 国产视频一区二区不卡 | 中文字幕高清视频 | 五月香婷婷 | 污黄视频在线观看 | 在线成人小视频 | 大地资源中文第三页 | 国产又白又嫩又爽又黄 | 黄色片小视频 | 老局长的粗大高h | 8x8ⅹ成人永久免费视频 | 国产精品77| 国产剧情久久 | 大地资源影视在线播放观看高清视频 | 欧美xxxbbb | 黄色一级片在线 | 另类小说欧美 | 国产福利一区二区视频 | 91蝌蚪少妇 | 午夜777| 淫语对白 | 免费级毛片 | 精品1卡二卡三卡四卡老狼 日韩三级网 | 一区二区国产在线 | 久草福利网 | 国产污视频 | 黑人操亚洲人 | 男人的天堂色偷偷 | 在线免费观看av不卡 | 精品偷拍一区 | 夜夜se| 国产农村妇女精品一区 | 福利一区福利二区 | 成人福利影院 | 老鸭窝一区二区 | 欧美国产精品一二三 | 中日韩精品在线 | 久久精品视频免费观看 | 日本亚洲国产 | 91精品啪在线观看国产线免费 | 国产一级视频免费观看 | 久久久久免费 | 一本色道久久加勒比精品 | 超碰在线网址 | 国产日韩在线看 | a猛片| 91影院在线免费观看 | 手机av资源 | 在线美女av | 国精产品一品二品国精品69xx | 日批视频免费在线观看 | 亚洲综合免费观看高清完整版 | 男人天堂新地址 | 97超碰97| 亚洲成人天堂 | 国产三级一区二区 | 日日干夜夜爱 | 噜噜噜色 | 91精品一区二区三区综合在线爱 | 伊人365影院| 欧美美女网站 | 欧美激情videos| 亚洲少妇网站 | 亚洲免费成人在线 | 成人夜视频| xxxx视频在线| 久久99热久久99精品 | 国模吧无码一区二区三区 | av永久在线| 国产精品久久久久久婷婷天堂 |