日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

利用波士顿房价数据集实现房价预测

發(fā)布時(shí)間:2023/12/20 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 利用波士顿房价数据集实现房价预测 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

    • 一、 觀察波士頓房?jī)r(jià)數(shù)據(jù)并加載數(shù)據(jù)集
      • 1、加載數(shù)據(jù)集
    • 二、 特征選擇
    • 三、 模型選擇
    • 四、 模型訓(xùn)練和測(cè)試
      • 1、 訓(xùn)練模型
      • 2、打印線性方程參數(shù)
      • 3、模型預(yù)測(cè)
      • 4、 計(jì)算mae、mse
      • 5、 畫(huà)出學(xué)習(xí)曲線
    • 五、 模型性能評(píng)估和優(yōu)化
      • 1、 模型優(yōu)化,考慮用二項(xiàng)式和三項(xiàng)式優(yōu)化
      • 2、 劃分?jǐn)?shù)據(jù)集函數(shù)
      • 3、定義MAE、MSE函數(shù)
      • 4、定義多項(xiàng)式模型函數(shù)
      • 5、 訓(xùn)練模型
      • 6、 定義畫(huà)出學(xué)習(xí)曲線的函數(shù)
      • 7、定義1、2、3次多項(xiàng)式
      • 8、劃分?jǐn)?shù)據(jù)集
      • 9、訓(xùn)練模型,并打印train score
      • 10、畫(huà)出學(xué)習(xí)曲線
    • 六、 結(jié)論與分析

一、 觀察波士頓房?jī)r(jià)數(shù)據(jù)并加載數(shù)據(jù)集

1、加載數(shù)據(jù)集

from sklearn.datasets import load_boston import pandas as pd import matplotlib.pyplot as plt import numpy as npboston=load_boston() df=pd.DataFrame(boston.data,columns=boston.feature_names) df['target']=boston.target

數(shù)據(jù)集共506條,包含有13個(gè)與房?jī)r(jià)相關(guān)的特征,分別是:

name釋義
CRIM城鎮(zhèn)人均犯罪率
ZN住宅用地所占比例
INDUS城鎮(zhèn)中非住宅用地所占比例
CHAS虛擬變量,用于回歸分析
NOX環(huán)保指數(shù)
RM每棟住宅的房間數(shù)
AGE1940 年以前建成的自住單位的比例
DIS距離 5 個(gè)波士頓的就業(yè)中心的加權(quán)距離
RAD距離高速公路的便利指數(shù)
TAX每一萬(wàn)美元的不動(dòng)產(chǎn)稅率
PTRATIO城鎮(zhèn)中的教師學(xué)生比例
B城鎮(zhèn)中的黑人比例
LSTAT地區(qū)中有多少房東屬于低收入人群

2、查看數(shù)據(jù)項(xiàng)

#查看數(shù)據(jù)項(xiàng) df.head()

二、 特征選擇

1、 畫(huà)出各數(shù)據(jù)項(xiàng)和房?jī)r(jià)的散點(diǎn)圖
2、 根據(jù)散點(diǎn)圖粗略選擇CRIM, RM, LSTAT三個(gè)特征值

features=df[['RM','CRIM', 'LSTAT']] target=df['target']

三、 模型選擇

利用多元線性回歸模型,其中自變量為數(shù)據(jù)集中的 feature_names 的維度(13維度),因變量為數(shù)據(jù)集中的 target 維度(房?jī)r(jià))

#數(shù)據(jù)集劃分 split_num=int(len(features)*0.8) X_train=features[:split_num] Y_train=target[:split_num] X_test=features[split_num:] Y_test=target[split_num:]

設(shè)置標(biāo)簽字段,切分?jǐn)?shù)據(jù)集:訓(xùn)練集80%,測(cè)試集20%

四、 模型訓(xùn)練和測(cè)試

1、 訓(xùn)練模型

split_num=int(len(features)*0.8) X_train=features[:split_num] Y_train=target[:split_num] X_test=features[split_num:] Y_test=target[split_num:]

2、打印線性方程參數(shù)

print(model.coef_,model.intercept_)

3、模型預(yù)測(cè)

preds=model.predict(X_test)

4、 計(jì)算mae、mse

def mae_value(y_true,y_pred):n=len(y_true)mae=sum(np.abs(y_true-y_pred))/n return maedef mse_value(y_true,y_pred):n=len(y_true)mse=sum(np.square(y_true-y_pred))/n return msemae=mae_value(Y_test.values,preds) mse=mse_value(Y_test.values,preds) print("MAE",mae) print("MSE",mse)

5、 畫(huà)出學(xué)習(xí)曲線

from sklearn.model_selection import learning_curve from sklearn.model_selection import ShuffleSplit import matplotlib.pyplot as plt import numpy as np def plot_learning_curve(plt,estimator,title,X,y,ylim=None,cv=None,n_jobs=1,train_sizes=np.linspace(.1,1.0,5)):plt.title(title)if ylim is not None:plt.ylim(ylim)plt.xlabel("Training examples")plt.ylabel("Score")train_sizes,train_scores,test_scores=learning_curve(estimator,X,y,cv=cv,n_jobs=n_jobs,train_sizes=train_sizes)train_scores_mean=np.mean(train_scores,axis=1)train_scores_std=np.std(train_scores,axis=1)test_scores_mean=np.mean(test_scores,axis=1)test_scores_std=np.std(test_scores,axis=1)plt.grid()plt.fill_between(train_sizes,train_scores_mean-train_scores_std,train_scores_mean+train_scores_std,alpha=0.1,color="r")plt.fill_between(train_sizes,test_scores_mean-test_scores_std,test_scores_mean+test_scores_std,alpha=0.1,color="g")plt.plot(train_sizes,train_scores_mean,'o--',color="r",label="Training scores")plt.plot(train_sizes,test_scores_mean,'o-',color="g",label="Cross-validation score")plt.legend(loc="best")return pltcv=ShuffleSplit(n_splits=10,test_size=0.2,random_state=0) plt.figure(figsize=(10,6)) plot_learning_curve(plt,model,"Learn Curve for LinearRegression",features,target,ylim=None,cv=cv) plt.show()

五、 模型性能評(píng)估和優(yōu)化

1、 模型優(yōu)化,考慮用二項(xiàng)式和三項(xiàng)式優(yōu)化

import matplotlib.pyplot as plt import numpy as np from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import Pipeline from sklearn.model_selection import ShuffleSplit from sklearn.model_selection import learning_curve

2、 劃分?jǐn)?shù)據(jù)集函數(shù)

def split_data():boston = load_boston()x = boston.datay = boston.targetprint(boston.feature_names)x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,random_state=2)return (x, y, x_train, x_test, y_train, y_test)

3、定義MAE、MSE函數(shù)

def mae_value(y_true,y_pred):n=len(y_true)mae=sum(np.abs(y_true-y_pred))/nreturn maedef mse_value(y_true,y_pred):n=len(y_true)mse=sum(np.square(y_true-y_pred))/n return mse

4、定義多項(xiàng)式模型函數(shù)

def polynomial_regression(degree=1):polynomial_features = PolynomialFeatures(degree=degree, include_bias=False)#模型開(kāi)啟數(shù)據(jù)歸一化linear_regression_model = LinearRegression(normalize=True)model = Pipeline([("polynomial_features", polynomial_features),("linear_regression", linear_regression_model)])return model

5、 訓(xùn)練模型

def train_model(x_train, x_test, y_train, y_test, degrees): res = []for degree in degrees:model = polynomial_regression(degree)model.fit(x_train, y_train)train_score = model.score(x_train, y_train)test_score = model.score(x_test, y_test)res.append({"model": model, "degree": degree, "train_score": train_score, "test_score": test_score})preds=model.predict(x_test)mae=mae_value(y_test,preds)mse=mse_value(y_test,preds)print(" degree: " ,degree, " MAE:",mae," MSE",mse)for r in res:print("degree: {}; train score: {}; test_score: {}".format(r["degree"], r["train_score"], r["test_score"]))return res

6、 定義畫(huà)出學(xué)習(xí)曲線的函數(shù)

def plot_learning_curve(plt,estimator,title,X,y,ylim=None,cv=None,n_jobs=1,train_sizes=np.linspace(.1,1.0,5)):plt.title(title)if ylim is not None:plt.ylim(ylim)plt.xlabel("Training examples")plt.ylabel("Score")train_sizes,train_scores,test_scores=learning_curve(estimator,X,y,cv=cv,n_jobs=n_jobs,train_sizes=train_sizes)train_scores_mean=np.mean(train_scores,axis=1)train_scores_std=np.std(train_scores,axis=1)test_scores_mean=np.mean(test_scores,axis=1)test_scores_std=np.std(test_scores,axis=1)plt.grid()plt.fill_between(train_sizes,train_scores_mean-train_scores_std,train_scores_mean+train_scores_std,alpha=0.1,color="r")plt.fill_between(train_sizes,test_scores_mean-test_scores_std,test_scores_mean+test_scores_std,alpha=0.1,color="g")plt.plot(train_sizes,train_scores_mean,'o--',color="r",label="Training scores")plt.plot(train_sizes,test_scores_mean,'o-',color="g",label="Cross-validation score")plt.legend(loc="best")return plt

7、定義1、2、3次多項(xiàng)式

degrees = [1,2,3]

8、劃分?jǐn)?shù)據(jù)集

x, y, x_train, x_test, y_train, y_test = split_data()

9、訓(xùn)練模型,并打印train score

res = train_model(x_train, x_test, y_train, y_test, degrees)

10、畫(huà)出學(xué)習(xí)曲線

cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0) plt.figure(figsize=(10, 6))for index, data in enumerate(res):plot_learning_curve(plt, data["model"], "degree %d"%data["degree"], x, y, cv=cv) plt.show()



六、 結(jié)論與分析

通過(guò)對(duì)波士頓房?jī)r(jià)數(shù)據(jù)的分析預(yù)測(cè)練習(xí),運(yùn)用多元回歸模型(一共十三個(gè)維度),前期訓(xùn)練量不足導(dǎo)致擬合程度不理想。經(jīng)過(guò)模型的參數(shù)優(yōu)化,采用了全部特征值,結(jié)果顯示一次多項(xiàng)式訓(xùn)練準(zhǔn)確度72%,測(cè)試準(zhǔn)確度76%。二次多項(xiàng)式訓(xùn)練準(zhǔn)確度92%,測(cè)試準(zhǔn)確度89%,mae=2.36, mse=8.67。綜上所述,采用二次多項(xiàng)式回歸方法優(yōu)化效果較好。

總結(jié)

以上是生活随笔為你收集整理的利用波士顿房价数据集实现房价预测的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。