机器学习--岭回归10
嶺回歸介紹
用正規方程求解線性回歸時,
嶺回歸是線性回歸的正則化版本,即在原來的線性回歸的 cost function 中添加正則項(regularization term):
利用嶺回歸實現Longley數據集,Longley數據集來自J.W.Longley(1967)發表在JASA上的一篇論文,是強共線性的宏觀經濟數據,包含GNP deflator(GNP平減指數)、GNP(國民生產總值)、Unemployed(失業率)、rmedForces(武裝力量)、Population(人口)、year(年份),Emlpoyed(就業率)。LongLey數據集因存在嚴重的多重共線性問題,在早期經常用來檢驗各種算法或計算機的計算精度。
sklearn中的嶺回歸
導入包
vdfgfd
import numpy as np from numpy import genfromtxt import pandas as pd from sklearn import linear_model import matplotlib.pyplot as plt讀入數據
data = pd.read_csv("longley.csv",delimiter=',') print(data)Unnamed: 0 GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
0 1947 83.0 234.289 235.6 159.0 107.608 1947 60.323
1 1948 88.5 259.426 232.5 145.6 108.632 1948 61.122
2 1949 88.2 258.054 368.2 161.6 109.773 1949 60.171
3 1950 89.5 284.599 335.1 165.0 110.929 1950 61.187
4 1951 96.2 328.975 209.9 309.9 112.075 1951 63.221
5 1952 98.1 346.999 193.2 359.4 113.270 1952 63.639
6 1953 99.0 365.385 187.0 354.7 115.094 1953 64.989
7 1954 100.0 363.112 357.8 335.0 116.219 1954 63.761
8 1955 101.2 397.469 290.4 304.8 117.388 1955 66.019
9 1956 104.6 419.180 282.2 285.7 118.734 1956 67.857
10 1957 108.4 442.769 293.6 279.8 120.445 1957 68.169
11 1958 110.8 444.546 468.1 263.7 121.950 1958 66.513
12 1959 112.6 482.704 381.3 255.2 123.366 1959 68.655
13 1960 114.2 502.601 393.1 251.4 125.368 1960 69.564
14 1961 115.7 518.173 480.6 257.2 127.852 1961 69.331
15 1962 116.9 554.894 400.7 282.7 130.081 1962 70.551
切分數據
x_data = data.iloc[:,2:] y_data = data.iloc[:,1] print(x_data) print(y_data)創建模型
# 生成50個值,作為嶺回歸系數 alphas_to_test = np.linspace(0.001, 1) # 創建模型,保存誤差值 estimator = linear_model.RidgeCV(alphas=alphas_to_test, store_cv_values=True) estimator.fit(x_data, y_data)# 嶺回歸系數 print(estimator.alpha_) # loss值 print(estimator.cv_values_.shape)0.40875510204081633
(16, 50)
array([88.11216213])
正規方程嶺回歸
導入包
import numpy as np from numpy import genfromtxt import pandas as pd import matplotlib.pyplot as plt讀入數據
data = pd.read_csv("longley.csv",delimiter=',') print(data) Unnamed: 0 GNP.deflator GNP Unemployed Armed.Forces Population Year Employed 0 1947 83.0 234.289 235.6 159.0 107.608 1947 60.323 1 1948 88.5 259.426 232.5 145.6 108.632 1948 61.122 2 1949 88.2 258.054 368.2 161.6 109.773 1949 60.171 3 1950 89.5 284.599 335.1 165.0 110.929 1950 61.187 4 1951 96.2 328.975 209.9 309.9 112.075 1951 63.221 5 1952 98.1 346.999 193.2 359.4 113.270 1952 63.639 6 1953 99.0 365.385 187.0 354.7 115.094 1953 64.989 7 1954 100.0 363.112 357.8 335.0 116.219 1954 63.761 8 1955 101.2 397.469 290.4 304.8 117.388 1955 66.019 9 1956 104.6 419.180 282.2 285.7 118.734 1956 67.857 10 1957 108.4 442.769 293.6 279.8 120.445 1957 68.169 11 1958 110.8 444.546 468.1 263.7 121.950 1958 66.513 12 1959 112.6 482.704 381.3 255.2 123.366 1959 68.655 13 1960 114.2 502.601 393.1 251.4 125.368 1960 69.564 14 1961 115.7 518.173 480.6 257.2 127.852 1961 69.331 15 1962 116.9 554.894 400.7 282.7 130.081 1962 70.551切分數據
x_data = data.iloc[0:,2:] y_data = data.iloc[0:,1] y_data = y_data[:, np.newaxis] print(x_data) print(y_data) GNP Unemployed Armed.Forces Population Year Employed 0 234.289 235.6 159.0 107.608 1947 60.323 1 259.426 232.5 145.6 108.632 1948 61.122 2 258.054 368.2 161.6 109.773 1949 60.171 3 284.599 335.1 165.0 110.929 1950 61.187 4 328.975 209.9 309.9 112.075 1951 63.221 5 346.999 193.2 359.4 113.270 1952 63.639 6 365.385 187.0 354.7 115.094 1953 64.989 7 363.112 357.8 335.0 116.219 1954 63.761 8 397.469 290.4 304.8 117.388 1955 66.019 9 419.180 282.2 285.7 118.734 1956 67.857 10 442.769 293.6 279.8 120.445 1957 68.169 11 444.546 468.1 263.7 121.950 1958 66.513 12 482.704 381.3 255.2 123.366 1959 68.655 13 502.601 393.1 251.4 125.368 1960 69.564 14 518.173 480.6 257.2 127.852 1961 69.331 15 554.894 400.7 282.7 130.081 1962 70.551 [[ 83. ][ 88.5][ 88.2][ 89.5][ 96.2][ 98.1][ 99. ][100. ][101.2][104.6][108.4][110.8][112.6][114.2][115.7][116.9]] print(np.mat(x_data).shape) print(np.mat(y_data).shape) # 給樣本添加偏置項 X_data = np.concatenate((np.ones((16,1)),x_data),axis=1) print(X_data.shape)(16, 6)
(16, 1)
(16, 7)
[[1.00000e+00 2.34289e+02 2.35600e+02 1.59000e+02 1.07608e+02,1.94700e+03
6.03230e+01]
[1.00000e+00 2.59426e+02 2.32500e+02 1.45600e+02 1.08632e+02 1.94800e+03 6.11220e+01]
[1.00000e+00 2.58054e+02 3.68200e+02 1.61600e+02 1.09773e+02 1.94900e+036.01710e+01]]
嶺回歸標準方程法求解回歸參數
def weights(xArr, yArr, lam=0.2):xMat = np.mat(xArr)yMat = np.mat(yArr)xTx = xMat.T*xMat # 矩陣乘法rxTx = xTx + np.eye(xMat.shape[1])*lam# 計算矩陣的值,如果值為0,說明該矩陣沒有逆矩陣if np.linalg.det(rxTx) == 0.0:print("This matrix cannot do inverse")return# xTx.I為xTx的逆矩陣ws = rxTx.I*xMat.T*yMatreturn ws ws = weights(X_data,y_data) print(ws)[[ 7.38107648e-04]
[ 2.07703836e-01]
[ 2.10076376e-02]
[ 5.05385441e-03]
[-1.59173066e+00]
[ 1.10442920e-01]
[-2.42280461e-01]]
計算預測值
np.mat(X_data)*np.mat(ws)matrix([[ 83.55075226],
[ 86.92588689],
[ 88.09720228],
[ 90.95677622],
[ 96.06951002],
[ 97.81955375],
[ 98.36444357],
[ 99.99814266],
[103.26832266],
[105.03165135],
[107.45224671],
[109.52190685],
[112.91863666],
[113.98357055],
[115.29845063],
[117.64279933]])
總結
以上是生活随笔為你收集整理的机器学习--岭回归10的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: PCB学习笔记——DRC检查
- 下一篇: 乐游api接口平台(接口商)