机器学习模型 非线性模型_机器学习:通过预测菲亚特500的价格来观察线性模型的工作原理...
機(jī)器學(xué)習(xí)模型 非線性模型
Introduction
介紹
In this article, I’d like to speak about linear models by introducing you to a real project that I made. The project that you can find in my Github consists of predicting the prices of fiat 500.
在本文中,我想通過向您介紹我所做的真實(shí)項(xiàng)目來談?wù)摼€性模型。 您可以在我的Github中找到的項(xiàng)目包括預(yù)測(cè)菲亞特500的價(jià)格。
The dataset for my model presents 8 columns as you can see below and 1538 rows.
我的模型的數(shù)據(jù)集包含8列(如下所示)和1538行。
- model: pop, lounge, sport 模特:流行,休閑,運(yùn)動(dòng)
- engine_power: Kw of the engine engine_power:發(fā)動(dòng)機(jī)的千瓦
- age_in_days: age of the car in days age_in_days:汽車的使用天數(shù)
- km: kilometres of the car km:汽車的公里數(shù)
- previous_owners: number of previous owners previous_owners:以前的所有者數(shù)
- lat: latitude of the seller (the price of cars in Italy varies from North to South of the country) lat:賣方的緯度(意大利的汽車價(jià)格從該國(guó)的北部到南部不等)
- lon: longitude of the seller (the price of cars in Italy varies from North to South of the country) lon:賣方的經(jīng)度(意大利的汽車價(jià)格從該國(guó)的北部到南部不等)
- price: selling price 價(jià)格:售價(jià)
During this article, we will see in the first part some concepts about the linear regression, the ridge regression and the lasso regression. Then I will show you the fundamental insights that I found about the dataset I considered and last but not least we will see the preparation and the metrics I used to evaluate the performance of my model.
在本文中,我們將在第一部分中看到有關(guān)線性回歸,嶺回歸和套索回歸的一些概念。 然后,我將向您展示我對(duì)所考慮的數(shù)據(jù)集的基本見解,最后但并非最不重要的一點(diǎn)是,我們將看到用于評(píng)估模型性能的準(zhǔn)備工作和度量標(biāo)準(zhǔn)。
Part I: Linear Regression, Ridge Regression and Lasso Regression
第一部分:線性回歸,嶺回歸和套索回歸
Linear models are a class of models that make a prediction using a linear function of the input features.
線性模型是使用輸入要素的線性函數(shù)進(jìn)行預(yù)測(cè)的一類模型。
For what concerns regression, as we know the general formula looks like as follows:
對(duì)于回歸問題,我們知道一般公式如下所示:
As you already know x[0] to x[p] represents the features of a single data point. Instead, m a b are the parameters of the model that are learned and ? is the prediction the model makes.
如您所知,x [0]至x [p]表示單個(gè)數(shù)據(jù)點(diǎn)的特征。 取而代之的是,m是一B是被學(xué)習(xí)的模型的參數(shù),y是預(yù)測(cè)的模型使。
There are many linear models for regression. The difference between these models is about how the model parameters m and b are learned from the training data and how model complexity can be controlled. We will see three models for regression.
有許多線性模型可用于回歸。 這些模型之間的差異在于如何從訓(xùn)練數(shù)據(jù)中學(xué)習(xí)模型參數(shù)m和b以及如何控制模型復(fù)雜性。 我們將看到三種回歸模型。
Linear regression (ordinary least squares) → it finds the parameters m and b that minimize the mean squared error between predictions and the true regression targets, y, on the training set. The MSE is the sum of the squared differences between the predictions and the true value. Below how to compute it with scikit-learn.
線性回歸(普通最小二乘) →它找到參數(shù)m和b ,該參數(shù)使訓(xùn)練集上的預(yù)測(cè)與真實(shí)回歸目標(biāo)y之間的均方誤差最小。 MSE是預(yù)測(cè)值與真實(shí)值之間平方差的總和。 下面是如何使用scikit-learn計(jì)算它。
Ridge regression → the formula it uses to make predictions is the same one used for the linear regression. In the ridge regression, the coefficients(m) are chosen for predicting well on the training data but also to fit the additional constraint. We want all entries of m should be close to zero. That means each feature should have a little effect on the outcome as possible(small slope), while still predicting well. This constraint is called regularization which means restricting a model to avoid overfitting. The particular ridge regression regularization is known as L2. Ridge regression is implemented in linear_model.Ridge as you can see below. In particular, by increasing alpha, we move the coefficients toward zero, which decreases training set performance but might help generalization and avoid overfitting.
Ridge回歸 →用于進(jìn)行預(yù)測(cè)的公式與用于線性回歸的公式相同。 在嶺回歸中,選擇系數(shù)(m)可以很好地預(yù)測(cè)訓(xùn)練數(shù)據(jù),但也可以擬合附加約束。 我們希望m的所有條目都應(yīng)接近零。 這意味著每個(gè)特征都應(yīng)該對(duì)結(jié)果產(chǎn)生盡可能小的影響(小斜率),同時(shí)仍能很好地預(yù)測(cè)。 此約束稱為正則化,這意味著限制模型以避免過度擬合。 特定的嶺回歸正則化稱為L(zhǎng)2。 Ridge回歸在linear_model.Ridge中實(shí)現(xiàn),如下所示。 特別是,通過增加alpha,我們會(huì)將系數(shù)移向零,這會(huì)降低訓(xùn)練集的性能,但可能有助于泛化并避免過度擬合。
Lasso regression → an alternative for regularizing is Lasso. As with ridge regression, using the lasso also restricts coefficients to be close to zero, but in a slightly different way, called L1 regularization. The consequence of L1 regularization is that when using the lasso, some coefficients are exactly zero. This means some features are entirely ignored by the model.
拉索回歸 →拉索正則化的替代方法。 與ridge回歸一樣,使用套索也將系數(shù)限制為接近零,但方式略有不同,稱為L(zhǎng)1正則化。 L1正則化的結(jié)果是,使用套索時(shí),某些系數(shù)正好為零。 這意味著模型將完全忽略某些功能。
print(“Training set score: {:.2f}”.format(lasso.score(X_train, y_train))) print(“Test set score: {:.2f}”.format(lasso.score(X_test, y_test))) print(“Number of features used: {}”.format(np.sum(lasso.coef_ != 0)))
Part II: Insights that I found
第二部分:我發(fā)現(xiàn)的見解
Before to see the part about the preparation and evaluation of the model, it is useful to take a look at the situation of the dataset.
在查看有關(guān)模型準(zhǔn)備和評(píng)估的部分之前,先了解一下數(shù)據(jù)集的情況是很有用的。
In the below scatter matrix we can observe that there some particular correlations between some features like km, age_in_days and price.
在下面的散點(diǎn)矩陣中,我們可以觀察到某些特征(例如km,age_in_days和價(jià)格)之間存在某些特定的相關(guān)性。
image by author圖片作者Instead in the following correlation-matrix, we can see very well the result of correlations between the features.
相反,在下面的相關(guān)矩陣中,我們可以很好地看到特征之間的相關(guān)結(jié)果。
In particular, between age_in_days and price or km and price, we have a great correlation.
特別是在age_in_days和價(jià)格之間或km和價(jià)格之間,我們有很大的相關(guān)性。
This is the starting point for constructing our model and know which machine learning model could be fit better.
這是構(gòu)建我們的模型的起點(diǎn),并且知道哪種機(jī)器學(xué)習(xí)模型更合適。
image by author圖片作者Part III: Prepare and evaluate the performance of the model
第三部分:準(zhǔn)備和評(píng)估模型的性能
To train and test the dataset I used the Linear Regression.
為了訓(xùn)練和測(cè)試數(shù)據(jù)集,我使用了線性回歸。
from sklearn.linear_model import LinearRegressionX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
lr = LinearRegression()
lr.fit(X_train, y_train)
out:
出:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)In the following table, there are the coefficients for each feature that I considered for my model.
下表列出了我為模型考慮的每個(gè)功能的系數(shù)。
coef_df = pd.DataFrame(lr.coef_, X.columns, columns=['Coefficient'])coef_df
out:
出:
Now, it is time to evaluate the model. In the following graph, characterized by a sample of 30 data points, we can observe the comparison between predicted values and actual values. As we can see our model is pretty good.
現(xiàn)在,該評(píng)估模型了。 在以30個(gè)數(shù)據(jù)點(diǎn)為樣本的下圖中,我們可以觀察到預(yù)測(cè)值與實(shí)際值之間的比較。 我們可以看到我們的模型非常好。
image by author圖片作者The R-squared is a good measure of the ability of the model inputs to explain the variation of the dependent variables. In our case, we have 85%.
R平方可以很好地衡量模型輸入解釋因變量變化的能力。 就我們而言,我們有85%。
from sklearn.metrics import r2_score round(sklearn.metrics.r2_score(y_test, y_pred), 2)out:
出:
0.85Now I compute the MAE, MSE and the RMSE to have a more precise overview of the performance of the model.
現(xiàn)在,我計(jì)算MAE,MSE和RMSE,以更精確地概述模型的性能。
from sklearn import metrics print(‘Mean Absolute Error:’, metrics.mean_absolute_error(y_test, y_pred)) print(‘Mean Squared Error:’, metrics.mean_squared_error(y_test, y_pred))print(‘Root Mean Squared Error:',np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
Finally, by comparing the training set score and the test set score we can see how performative is our model.
最后,通過比較訓(xùn)練集得分和測(cè)試集得分,我們可以看到模型的性能如何。
print("Training set score: {:.2f}".format(lr.score(X_train, y_train)))print("Test set score: {:.2f}".format(lr.score(X_test, y_test)))out:
出:
Training set score: 0.83 Test set score: 0.85Conclusion
結(jié)論
Linear models are a class of models that are widely used in practice and have been studied extensively in the last few years in particular for machine learning. So, with this article, I hope you have obtained a good starting point in order to improve yourself and create your own Linear model.
線性模型是一類在實(shí)踐中廣泛使用的模型,并且在最近幾年中,特別是對(duì)于機(jī)器學(xué)習(xí),已經(jīng)進(jìn)行了廣泛的研究。 因此,希望本文能夠?yàn)槟岣咦约翰?chuàng)建自己的線性模型提供一個(gè)良好的起點(diǎn)。
Thanks for reading this. There are some other ways you can keep in touch with me and follow my work:
感謝您閱讀本文。 您可以通過其他方法與我保持聯(lián)系并關(guān)注我的工作:
Subscribe to my newsletter.
訂閱我的時(shí)事通訊。
You can also get in touch via my Telegram group, Data Science for Beginners.
您也可以通過我的電報(bào)小組“ 面向初學(xué)者的數(shù)據(jù)科學(xué)”來聯(lián)系 。
翻譯自: https://towardsdatascience.com/machine-learning-observe-how-a-linear-model-works-by-predicting-the-prices-of-the-fiat-500-fb38e0d22681
機(jī)器學(xué)習(xí)模型 非線性模型
總結(jié)
以上是生活随笔為你收集整理的机器学习模型 非线性模型_机器学习:通过预测菲亚特500的价格来观察线性模型的工作原理...的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 女生梦到自己生孩子了意味着什么
- 下一篇: 网页缩放与窗口缩放_功能缩放—不同的Sc