机器学习--岭回归10
嶺回歸介紹
用正規(guī)方程求解線性回歸時(shí),
嶺回歸是線性回歸的正則化版本,即在原來(lái)的線性回歸的 cost function 中添加正則項(xiàng)(regularization term):
利用嶺回歸實(shí)現(xiàn)Longley數(shù)據(jù)集,Longley數(shù)據(jù)集來(lái)自J.W.Longley(1967)發(fā)表在JASA上的一篇論文,是強(qiáng)共線性的宏觀經(jīng)濟(jì)數(shù)據(jù),包含GNP deflator(GNP平減指數(shù))、GNP(國(guó)民生產(chǎn)總值)、Unemployed(失業(yè)率)、rmedForces(武裝力量)、Population(人口)、year(年份),Emlpoyed(就業(yè)率)。LongLey數(shù)據(jù)集因存在嚴(yán)重的多重共線性問題,在早期經(jīng)常用來(lái)檢驗(yàn)各種算法或計(jì)算機(jī)的計(jì)算精度。
sklearn中的嶺回歸
導(dǎo)入包
vdfgfd
import numpy as np from numpy import genfromtxt import pandas as pd from sklearn import linear_model import matplotlib.pyplot as plt讀入數(shù)據(jù)
data = pd.read_csv("longley.csv",delimiter=',') print(data)Unnamed: 0 GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
0 1947 83.0 234.289 235.6 159.0 107.608 1947 60.323
1 1948 88.5 259.426 232.5 145.6 108.632 1948 61.122
2 1949 88.2 258.054 368.2 161.6 109.773 1949 60.171
3 1950 89.5 284.599 335.1 165.0 110.929 1950 61.187
4 1951 96.2 328.975 209.9 309.9 112.075 1951 63.221
5 1952 98.1 346.999 193.2 359.4 113.270 1952 63.639
6 1953 99.0 365.385 187.0 354.7 115.094 1953 64.989
7 1954 100.0 363.112 357.8 335.0 116.219 1954 63.761
8 1955 101.2 397.469 290.4 304.8 117.388 1955 66.019
9 1956 104.6 419.180 282.2 285.7 118.734 1956 67.857
10 1957 108.4 442.769 293.6 279.8 120.445 1957 68.169
11 1958 110.8 444.546 468.1 263.7 121.950 1958 66.513
12 1959 112.6 482.704 381.3 255.2 123.366 1959 68.655
13 1960 114.2 502.601 393.1 251.4 125.368 1960 69.564
14 1961 115.7 518.173 480.6 257.2 127.852 1961 69.331
15 1962 116.9 554.894 400.7 282.7 130.081 1962 70.551
切分?jǐn)?shù)據(jù)
x_data = data.iloc[:,2:] y_data = data.iloc[:,1] print(x_data) print(y_data)創(chuàng)建模型
# 生成50個(gè)值,作為嶺回歸系數(shù) alphas_to_test = np.linspace(0.001, 1) # 創(chuàng)建模型,保存誤差值 estimator = linear_model.RidgeCV(alphas=alphas_to_test, store_cv_values=True) estimator.fit(x_data, y_data)# 嶺回歸系數(shù) print(estimator.alpha_) # loss值 print(estimator.cv_values_.shape)0.40875510204081633
(16, 50)
array([88.11216213])
正規(guī)方程嶺回歸
導(dǎo)入包
import numpy as np from numpy import genfromtxt import pandas as pd import matplotlib.pyplot as plt讀入數(shù)據(jù)
data = pd.read_csv("longley.csv",delimiter=',') print(data) Unnamed: 0 GNP.deflator GNP Unemployed Armed.Forces Population Year Employed 0 1947 83.0 234.289 235.6 159.0 107.608 1947 60.323 1 1948 88.5 259.426 232.5 145.6 108.632 1948 61.122 2 1949 88.2 258.054 368.2 161.6 109.773 1949 60.171 3 1950 89.5 284.599 335.1 165.0 110.929 1950 61.187 4 1951 96.2 328.975 209.9 309.9 112.075 1951 63.221 5 1952 98.1 346.999 193.2 359.4 113.270 1952 63.639 6 1953 99.0 365.385 187.0 354.7 115.094 1953 64.989 7 1954 100.0 363.112 357.8 335.0 116.219 1954 63.761 8 1955 101.2 397.469 290.4 304.8 117.388 1955 66.019 9 1956 104.6 419.180 282.2 285.7 118.734 1956 67.857 10 1957 108.4 442.769 293.6 279.8 120.445 1957 68.169 11 1958 110.8 444.546 468.1 263.7 121.950 1958 66.513 12 1959 112.6 482.704 381.3 255.2 123.366 1959 68.655 13 1960 114.2 502.601 393.1 251.4 125.368 1960 69.564 14 1961 115.7 518.173 480.6 257.2 127.852 1961 69.331 15 1962 116.9 554.894 400.7 282.7 130.081 1962 70.551切分?jǐn)?shù)據(jù)
x_data = data.iloc[0:,2:] y_data = data.iloc[0:,1] y_data = y_data[:, np.newaxis] print(x_data) print(y_data) GNP Unemployed Armed.Forces Population Year Employed 0 234.289 235.6 159.0 107.608 1947 60.323 1 259.426 232.5 145.6 108.632 1948 61.122 2 258.054 368.2 161.6 109.773 1949 60.171 3 284.599 335.1 165.0 110.929 1950 61.187 4 328.975 209.9 309.9 112.075 1951 63.221 5 346.999 193.2 359.4 113.270 1952 63.639 6 365.385 187.0 354.7 115.094 1953 64.989 7 363.112 357.8 335.0 116.219 1954 63.761 8 397.469 290.4 304.8 117.388 1955 66.019 9 419.180 282.2 285.7 118.734 1956 67.857 10 442.769 293.6 279.8 120.445 1957 68.169 11 444.546 468.1 263.7 121.950 1958 66.513 12 482.704 381.3 255.2 123.366 1959 68.655 13 502.601 393.1 251.4 125.368 1960 69.564 14 518.173 480.6 257.2 127.852 1961 69.331 15 554.894 400.7 282.7 130.081 1962 70.551 [[ 83. ][ 88.5][ 88.2][ 89.5][ 96.2][ 98.1][ 99. ][100. ][101.2][104.6][108.4][110.8][112.6][114.2][115.7][116.9]] print(np.mat(x_data).shape) print(np.mat(y_data).shape) # 給樣本添加偏置項(xiàng) X_data = np.concatenate((np.ones((16,1)),x_data),axis=1) print(X_data.shape)(16, 6)
(16, 1)
(16, 7)
[[1.00000e+00 2.34289e+02 2.35600e+02 1.59000e+02 1.07608e+02,1.94700e+03
6.03230e+01]
[1.00000e+00 2.59426e+02 2.32500e+02 1.45600e+02 1.08632e+02 1.94800e+03 6.11220e+01]
[1.00000e+00 2.58054e+02 3.68200e+02 1.61600e+02 1.09773e+02 1.94900e+036.01710e+01]]
嶺回歸標(biāo)準(zhǔn)方程法求解回歸參數(shù)
def weights(xArr, yArr, lam=0.2):xMat = np.mat(xArr)yMat = np.mat(yArr)xTx = xMat.T*xMat # 矩陣乘法rxTx = xTx + np.eye(xMat.shape[1])*lam# 計(jì)算矩陣的值,如果值為0,說(shuō)明該矩陣沒有逆矩陣if np.linalg.det(rxTx) == 0.0:print("This matrix cannot do inverse")return# xTx.I為xTx的逆矩陣ws = rxTx.I*xMat.T*yMatreturn ws ws = weights(X_data,y_data) print(ws)[[ 7.38107648e-04]
[ 2.07703836e-01]
[ 2.10076376e-02]
[ 5.05385441e-03]
[-1.59173066e+00]
[ 1.10442920e-01]
[-2.42280461e-01]]
計(jì)算預(yù)測(cè)值
np.mat(X_data)*np.mat(ws)matrix([[ 83.55075226],
[ 86.92588689],
[ 88.09720228],
[ 90.95677622],
[ 96.06951002],
[ 97.81955375],
[ 98.36444357],
[ 99.99814266],
[103.26832266],
[105.03165135],
[107.45224671],
[109.52190685],
[112.91863666],
[113.98357055],
[115.29845063],
[117.64279933]])
總結(jié)
以上是生活随笔為你收集整理的机器学习--岭回归10的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: PCB学习笔记——DRC检查
- 下一篇: 乐游api接口平台(接口商)