當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

3.Your First Machine Learning Model

發(fā)布時(shí)間：2023/12/10 编程问答 42 豆豆

生活随笔收集整理的這篇文章主要介紹了 3.Your First Machine Learning Model 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Selecting Data for Modeling

你的數(shù)據(jù)集有太多的變量包裹住你的頭。你怎么能把這些壓倒性的數(shù)據(jù)削減到你能理解的東西？
我們首先使用我們的直覺選擇一些變量。后面的課程將向您展示自動(dòng)確定變量優(yōu)先級(jí)的統(tǒng)計(jì)技巧。
要選擇變量/列，我們需要查看數(shù)據(jù)集中所有列。這是通過DataFrame的columns屬性（下面的代碼）完成的。

[1]

import pandas as pdmelbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv' melbourne_data = pd.read_csv(melbourne_file_path) melbourne_data.columns Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG','Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car','Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude','Longtitude', 'Regionname', 'Propertycount'],dtype='object')

[2]

# The Melbourne data has some missing values (some houses for which some variables weren't recorded.) # We'll learn to handle missing values in a later tutorial. # Your Iowa data doesn't have missing values in the columns you use. # So we will take the simplest option for now, and drop houses from our data. # Don't worry about this much for now, though the code is:# dropna drops missing values (think of na as "not available") melbourne_data = melbourne_data.dropna(axis=0)

有很多方法可以選擇數(shù)據(jù)的子集。 pandas課程更深入地介紹了這些內(nèi)容，但我們現(xiàn)在將重點(diǎn)關(guān)注兩種方法。

???? 點(diǎn)符號(hào)，我們用它來選擇“預(yù)測(cè)目標(biāo)”

???? 選擇列表，我們用它來選擇

Selecting The Prediction Target

您可以使用點(diǎn)符號(hào)來提取變量。這一列存儲(chǔ)在一個(gè)Series中，它大致類似于只有一列數(shù)據(jù)的DataFrame。
我們將使用點(diǎn)符號(hào)來選擇我們想要預(yù)測(cè)的列，這稱為預(yù)測(cè)目標(biāo)。按照慣例，預(yù)測(cè)目標(biāo)稱為y。因此，我們需要在墨爾本數(shù)據(jù)中保存房價(jià)的代碼是

[3]

y = melbourne_data.Price

Choosing "Features"

我們模型中的列（后來用于預(yù)測(cè)）被稱為“特征”。在我們的例子中，那些將是用于確定房價(jià)的列。有時(shí)，您將使用除目標(biāo)之外的所有列作為要素。其他時(shí)候你用更少的功能會(huì)更好。
目前，我們將構(gòu)建一個(gè)只有少數(shù)特征的模型。稍后您將看到如何迭代和比較使用不同特征構(gòu)建的模型。
我們通過在括號(hào)內(nèi)提供列表名來選擇多個(gè)特征。該列表中的每個(gè)項(xiàng)目都應(yīng)該是一個(gè)字符串（帶引號(hào)）。
這是一個(gè)例子：

【4】

melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']

按照慣例，這個(gè)數(shù)據(jù)稱為X.

【5】

X = melbourne_data[melbourne_features]

讓我們使用describe方法和head方法快速查看我們將用于預(yù)測(cè)房價(jià)的數(shù)據(jù)，該方法顯示前幾行。

【6】

X.describe() ?RoomsBathroomLandsizeLattitudeLongtitudecountmeanstdmin25%50%75%max

6196.000000	6196.000000	6196.000000	6196.000000	6196.000000
2.931407	1.576340	471.006940	-37.807904	144.990201
0.971079	0.711362	897.449881	0.075850	0.099165
1.000000	1.000000	0.000000	-38.164920	144.542370
2.000000	1.000000	152.000000	-37.855438	144.926198
3.000000	1.000000	373.000000	-37.802250	144.995800
4.000000	2.000000	628.000000	-37.758200	145.052700
8.000000	8.000000	37000.000000	-37.457090	145.526350

[7]

X.head() ?RoomsBathroomLandsizeLattitudeLongtitude12467

2	1.0	156.0	-37.8079	144.9934
3	2.0	134.0	-37.8093	144.9944
4	1.0	120.0	-37.8072	144.9941
3	2.0	245.0	-37.8024	144.9993
2	1.0	256.0	-37.8060	144.9954

使用這些命令直觀地檢查數(shù)據(jù)是數(shù)據(jù)科學(xué)家工作的重要組成部分。您經(jīng)常會(huì)在數(shù)據(jù)集中發(fā)現(xiàn)值得進(jìn)一步檢查的驚喜。

Building Your Model

您將使用scikit-learn庫來創(chuàng)建模型。編碼時(shí)，此庫編寫為sklearn，您將在示例代碼中看到。 Scikit-learn是最常用的庫，用于對(duì)通常存儲(chǔ)在DataFrame中的數(shù)據(jù)類型進(jìn)行建模。

構(gòu)建和使用模型的步驟如下：
???? 定義：它將是什么類型的模型？決策樹？其他一些模型？還指定了模型類型的一些其他參數(shù)。
? ? ?擬合：從提供的數(shù)據(jù)中捕獲模式，這是建模的核心。
???? 預(yù)測(cè)：聽起來是什么樣的
???? 評(píng)估：確定模型預(yù)測(cè)的準(zhǔn)確程度。

下面是使用scikit-learn定義決策樹模型并將其與特征和目標(biāo)變量擬合的示例。

【8】

from sklearn.tree import DecisionTreeRegressor# Define model. Specify a number for random_state to ensure same results each run melbourne_model = DecisionTreeRegressor(random_state=1)# Fit model melbourne_model.fit(X, y) DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,max_leaf_nodes=None, min_impurity_decrease=0.0,min_impurity_split=None, min_samples_leaf=1,min_samples_split=2, min_weight_fraction_leaf=0.0,presort=False, random_state=1, splitter='best')

許多機(jī)器學(xué)習(xí)模型允許模型訓(xùn)練中的一些隨機(jī)性。為random_state指定一個(gè)數(shù)字可確保您在每次運(yùn)行中獲得相同的結(jié)果。這被認(rèn)為是一種很好的做法。您使用任何數(shù)字，模型質(zhì)量不會(huì)取決于您選擇的確切值。

我們現(xiàn)在有一個(gè)可以用來進(jìn)行預(yù)測(cè)的擬合模型。

在實(shí)踐中，你會(huì)想要對(duì)市場(chǎng)上的新房子進(jìn)行預(yù)測(cè)，而不是對(duì)我們已經(jīng)有價(jià)格的房屋進(jìn)行預(yù)測(cè)。但是我們將對(duì)訓(xùn)練數(shù)據(jù)的前幾行進(jìn)行預(yù)測(cè)，以了解預(yù)測(cè)函數(shù)的工作原理。

【9】

print("Making predictions for the following 5 houses:") print(X.head()) print("The predictions are") print(melbourne_model.predict(X.head())) Making predictions for the following 5 houses:Rooms Bathroom Landsize Lattitude Longtitude 1 2 1.0 156.0 -37.8079 144.9934 2 3 2.0 134.0 -37.8093 144.9944 4 4 1.0 120.0 -37.8072 144.9941 6 3 2.0 245.0 -37.8024 144.9993 7 2 1.0 256.0 -37.8060 144.9954 The predictions are [1035000. 1465000. 1600000. 1876000. 1636000.]

Your Turn

嘗試進(jìn)行模型建立練習(xí)

總結(jié)

以上是生活随笔為你收集整理的3.Your First Machine Learning Model的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：真我GT2大师探索版将至：国际潮流设计师
下一篇：常用工具整理：数学，论文，代码等