回归分析_回归
回歸分析
Machine learning algorithms are not your regular algorithms that we may be used to because they are often described by a combination of some complex statistics and mathematics. Since it is very important to understand the background of any algorithm you want to implement, this could pose a challenge to people with a non-mathematical background as the maths can sap your motivation by slowing you down.
機(jī)器學(xué)習(xí)算法不是我們可能習(xí)慣的常規(guī)算法,因?yàn)樗鼈兺ǔS梢恍?fù)雜的統(tǒng)計(jì)數(shù)據(jù)和數(shù)學(xué)的組合來(lái)描述。 由于了解要實(shí)現(xiàn)的任何算法的背景非常重要,因此這可能會(huì)對(duì)非數(shù)學(xué)背景的人構(gòu)成挑戰(zhàn),因?yàn)閿?shù)學(xué)會(huì)通過(guò)減慢速度來(lái)降低您的動(dòng)力。
In this article, we would be discussing linear and logistic regression and some regression techniques assuming we all have heard or even learnt about the Linear model in Mathematics class at high school. Hopefully, at the end of the article, the concept would be clearer.
在本文中,我們將討論線性和邏輯回歸以及一些回歸技術(shù),假設(shè)我們都已經(jīng)聽(tīng)說(shuō)甚至中學(xué)了數(shù)學(xué)課上的線性模型。 希望在文章末尾,這個(gè)概念會(huì)更清楚。
Regression Analysis is a statistical process for estimating the relationships between the dependent variables (say Y) and one or more independent variables or predictors (X). It explains the changes in the dependent variables with respect to changes in select predictors. Some major uses for regression analysis are in determining the strength of predictors, forecasting an effect, and trend forecasting. It finds the significant relationship between variables and the impact of predictors on dependent variables. In regression, we fit a curve/line (regression/best fit line) to the data points, such that the differences between the distances of data points from the curve/line are minimized.
回歸分析是一種統(tǒng)計(jì)過(guò)程,用于估計(jì)因變量(例如Y)和一個(gè)或多個(gè)自變量或預(yù)測(cè)變量(X)之間的關(guān)系 。 它解釋了因變量相對(duì)于所選預(yù)測(cè)變量變化的變化。 回歸分析的一些主要用途是確定預(yù)測(cè)器的強(qiáng)度,預(yù)測(cè)效果和趨勢(shì)預(yù)測(cè)。 它發(fā)現(xiàn)變量之間的顯著關(guān)系以及預(yù)測(cè)變量對(duì)因變量的影響。 在回歸中,我們將曲線/直線(回歸/最佳擬合線)擬合到數(shù)據(jù)點(diǎn),以使數(shù)據(jù)點(diǎn)到曲線/直線的距離之間的差異最小。
線性回歸 (Linear Regression)
It is the simplest and most widely known regression technique. Linear Regression establishes a relationship between the dependent variable (Y) and one or more independent variables (X) using a regression line. This is done by the Ordinary Least-Squares method (OLS calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Since the deviations are first squared, when added, there is no cancelling out between positive and negative values). It is represented by the equation:
它是最簡(jiǎn)單,最廣為人知的回歸技術(shù)。 線性回歸使用回歸線在因變量(Y)和一個(gè)或多個(gè)自變量(X)之間建立關(guān)系。 這是通過(guò)普通最小二乘方法完成的 (OLS通過(guò)最小化每個(gè)數(shù)據(jù)點(diǎn)到該行的垂直偏差的平方和來(lái)計(jì)算觀測(cè)數(shù)據(jù)的最佳擬合線。 ,則無(wú)法在正值和負(fù)值之間抵消)。 它由等式表示:
Y=a+b*X + e; where a is intercept, b is slope of the line and e is error term.
Y = a + b * X + e; 其中a是截距,b是直線的斜率,e是誤差項(xiàng)。
The OLS has several assumptions. They are-
OLS有幾個(gè)假設(shè)。 他們是-
Linearity: The relationship between X and the mean of Y is linear.
線性 :X和Y的平均值之間的關(guān)系是線性的。
Normality: The error(residuals) follow a normal distribution.
正態(tài)性 :誤差(殘差)服從正態(tài)分布。
Homoscedasticity: The variance of residual is the same for any value of X (Constant variance of errors).
均 方差性:對(duì)于任何X值,殘差方差都是相同的(誤差的方差恒定)。
No Endogeneity of regressors: It refers to the prohibition of a link between the independent variables and the errors
回歸變量無(wú)內(nèi)生性 :指禁止自變量與錯(cuò)誤之間的聯(lián)系
No autocorrelation: Errors are assumed to be uncorrelated and randomly spread across the regression line.
無(wú)自相關(guān) :假定錯(cuò)誤是不相關(guān)的,并且隨機(jī)分布在回歸線上。
Independence/No multicollinearity: it is observed when two or more variables have a high correlation.
獨(dú)立/無(wú)多重共線性:當(dāng)兩個(gè)或多個(gè)變量具有高度相關(guān)性時(shí),會(huì)觀察到。
We have simple and multiple linear regression, the difference being that multiple linear regression has more than one independent variables, whereas simple linear regression has only one independent variable.
我們有簡(jiǎn)單和多元線性回歸,區(qū)別在于多元線性回歸具有多個(gè)自變量,而簡(jiǎn)單線性回歸只有一個(gè)自變量。
We can evaluate the performance of this model using the metric R-square.
我們可以使用度量R平方來(lái)評(píng)估此模型的性能。
邏輯回歸 (Logistic Regression)
Using linear regression, we can predict the price a customer will pay if he/she buys. With logistic regression we can make a more fundamental decision, “will the customer buy at all?”
使用線性回歸,我們可以預(yù)測(cè)客戶購(gòu)買時(shí)將支付的價(jià)格。 通過(guò)邏輯回歸,我們可以做出更基本的決定,“客戶是否愿意購(gòu)買?”
Here, there is a shift from numerical to categorical. It is used in solving classification problems and in prediction where our targets are categorical variables. It can handle various types of relationships between the independent variables and Y because it applies a non-linear log transformation to the predicted odds ratio.
在這里,從數(shù)字到絕對(duì)是一個(gè)轉(zhuǎn)變。 它用于解決分類問(wèn)題和預(yù)測(cè),其中我們的目標(biāo)是分類變量。 它可以處理自變量和Y之間的各種類型的關(guān)系,因?yàn)樗鼘⒎蔷€性對(duì)數(shù)轉(zhuǎn)換應(yīng)用于預(yù)測(cè)的優(yōu)勢(shì)比。
odds= p/ (1-p) ln(odds) = ln(p/(1-p))logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk
賠率= p /(1-p)ln(奇數(shù))= ln(p /(1-p))logit(p)= ln(p /(1-p))= b0 + b1X1 + b2X2 + b3X3…。+ bkXk
where p is the probability of event success and (1-p) is the probability of event failure.
其中p是事件成功的概率,而(1-p)是事件失敗的概率。
The logit function can map any real value between 0 and 1. The parameters in the equation above are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors.
logit函數(shù)可以映射0到1之間的任何實(shí)數(shù)值。選擇上式中的參數(shù)是為了最大化觀察樣本值的可能性,而不是最小化平方誤差的總和。
結(jié)論。 (Conclusion.)
I would encourage you to read further to get a more solid understanding. There are several techniques employed in increasing the robustness of regression. They include regularization/penalisation methods(Lasso, Ridge and ElasticNet), gradient descent, stepwise regression, and so on.
我鼓勵(lì)您進(jìn)一步閱讀以獲得更扎實(shí)的理解。 有幾種技術(shù)可以提高回歸的魯棒性。 它們包括正則化/懲罰化方法(Lasso,Ridge和ElasticNet),梯度下降,逐步回歸等。
Kindly note that they are not types of regression as was noticed in many articles online. Below, you will find links to articles I found helpful in explaining some concepts and for your further reading. Happy learning!
請(qǐng)注意,它們不是許多在線文章所注意到的回歸類型。 在下面,您會(huì)找到指向我的文章的鏈接,這些文章對(duì)我解釋一些概念和進(jìn)一步閱讀很有幫助。 學(xué)習(xí)愉快!
https://medium.com/datadriveninvestor/regression-in-machine-learning-296caae933ec
https://medium.com/datadriveninvestor/regression-in-machine-learning-296caae933ec
https://machinelearningmastery.com/linear-regression-for-machine-learning/
https://machinelearningmastery.com/linear-regression-for-machine-learning/
https://www.geeksforgeeks.org/ml-linear-regression/
https://www.geeksforgeeks.org/ml-linear-regression/
https://www.geeksforgeeks.org/types-of-regression-techniques/
https://www.geeksforgeeks.org/types-of-regression-techniques/
https://www.vebuso.com/2020/02/linear-to-logistic-regression-explained-step-by-step/
https://www.vebuso.com/2020/02/linear-to-logistic-regression-explained-step-by-step/
https://www.statisticssolutions.com/what-is-logistic-regression/
https://www.statisticssolutions.com/what-is-logistic-regression/
https://www.listendata.com/2014/11/difference-between-linear-regression.html#:~:text=Purpose%20%3A%20Linear%20regression%20is%20used,the%20probability%20of%20an%20event.
https://www.listendata.com/2014/11/difference-between-linear-regression.html#:~:text=Purpose%20%3A%20Linear%20regression%20is%20used,%20probability%20of%20an %20event 。
https://www.kaggle.com/residentmario/l1-norms-versus-l2-norms
https://www.kaggle.com/residentmario/l1-norms-versus-l2-norms
翻譯自: https://medium.com/analytics-vidhya/regression-15cfaffe805a
回歸分析
總結(jié)
- 上一篇: 晚上做梦梦到放牛是什么意思
- 下一篇: 线性回归算法数学原理_线性回归算法-非数