當前位置：首頁 > 编程语言 > python >内容正文

python

python 梯度下降_Python解释的闭合形式和梯度下降回归

發布時間：2023/12/15 python 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 梯度下降_Python解释的闭合形式和梯度下降回归小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

python 梯度下降

機器學習，編程 (Machine learning, Programming)

介紹 (Introduction)

Regression is a kind of supervised learning algorithm within machine learning. It is an approach to model the relationship between the dependent variable (or target, responses), y, and explanatory variables (or inputs, predictors), X. Its objective is to predict a quantity of the target variable, for example; predicting the stock price, which differs from classification problem, where we want to predict the label of target, for example; predicting the direction of stock (up or down).

回歸是機器學習中的一種監督學習算法。這是一種對因變量(或目標，響應) y和解釋變量(或輸入，預測變量) X之間的關系建模的方法。例如，其目的是預測目標變量的數量。預測股票價格，這與分類問題不同，例如，我們要預測目標標簽；預測庫存的方向(向上或向下)。

Moreover, a regression can be used to answer whether and how several variables are related, or influence each other, for example; determine if and to what extent the work experience or age impacts salaries.

而且，例如，可以使用回歸來回答幾個變量是否相關以及如何相互影響，或者相互影響。確定工作經驗或年齡是否以及在多大程度上影響工資。

In this article, I will focus mainly on linear regression and its approaches.

在本文中，我將主要關注線性回歸及其方法。

線性回歸的不同方法 (Different approaches to Linear Regression)

OLS (Ordinary least squares) goal is to find the best-fitting line (hyperplane) that minimizes the vertical offsets, which can be mean squared error (MSE) or other error metrics (MAE, RMSE) between the target variable and the predicted output.

OLS(普通最小二乘法)的目標是找到使垂直偏移最小的最佳擬合線(超平面)，垂直偏移可以是目標變量和預測輸出之間的均方誤差(MSE)或其他誤差度量(MAE，RMSE) 。

We can implement a linear regression model using the following approaches:

我們可以使用以下方法實現線性回歸模型：

Solving model parameters (closed-form equations)

求解模型參數(閉式方程)

Using optimization algorithm (gradient descent, stochastic gradient, etc.)

使用優化算法(梯度下降，隨機梯度等)

Please note that OLS regression estimates are the best linear unbiased estimator (BLUE, in short). Regression in other forms, the parameter estimates may be biased, for example; ridge regression is sometimes used to reduce the variance of estimates when there is collinearity in the data. However, the discussion of bias and variance is not in the scope of this article (please refer to this great article related to bias and variance).

請注意，OLS回歸估計是最佳線性無偏估計器 (簡稱BLUE)。例如，以其他形式回歸時，參數估計可能有偏差；當數據存在共線性時，脊回歸有時用于減少估計的方差。但是，對偏差和方差的討論不在本文的討論范圍內(請參閱這篇有關偏差和方差的出色文章 )。

閉式方程 (Closed-form equation)

Let’s assume we have inputs of X size n and a target variable, we can write the following equation to represent the linear regression model.

假設我們有X個大小為n的輸入和一個目標變量，我們可以編寫以下方程式來表示線性回歸模型。

Simple form of linear regression (where i = 1, 2, …, n)線性回歸的簡單形式(其中i = 1、2，…，n)

The equation is assumed we have the intercept X0 = 1. There is also a model without intercept, where B0 = 0, but this is based on some hypothesis that it will always undergo through the origin (there’s a lot of discussion on this topic which you can read more here and here).

假設該方程式我們有截距X0 =1。還有一個沒有截距的模型，其中B0 = 0，但這是基于某些假設，即它將始終通過原點進行(關于該主題有很多討論，您可以在此處和此處內容)。

From the equation above, we can compute the regression parameters based on the below computation.

從上面的方程式，我們可以根據下面的計算來計算回歸參數。

Matrix formulation of the multiple regression model多元回歸模型的矩陣表述

Now, let’s implement this in Python, there are three ways we can do this; manual matrix multiplication, statsmodels library, and sklearn library.

現在，讓我們在Python中實現此功能，可以通過三種方式實現此功能：手動矩陣乘法， statsmodels庫和sklearn庫。

The model weights模型權重 Example of single linear model, one input (left) and the model prediction (right)單個線性模型，一個輸入(左)和模型預測(右)的示例

You can see that all three solutions give out the same results, we can then use the output to write the model equation (Y =0.7914715+1.38594198X).

您可以看到所有三個解決方案都給出了相同的結果，然后我們可以使用輸出編寫模型方程式(Y = 0.7914715 + 1.38594198X )。

This approach offers a better solution for smaller data, easy, and quick explainable model.

這種方法為較小的數據，簡單且快速的可解釋模型提供了更好的解決方案。

梯度下降 (Gradient Descent)

Why we need gradient descent if the closed-form equation can solve the regression problem. There will be some situations which are;

如果封閉形式的方程可以解決回歸問題，為什么我們需要梯度下降。會有一些情況；

There is no closed-form solution for most nonlinear regression problems.
對于大多數非線性回歸問題，沒有封閉形式的解決方案。
Even in linear regression, there may be some cases where it is impractical to use the formula. An example is when X is a very large, sparse matrix. The solution will be too expensive to compute.
即使在線性回歸中，在某些情況下使用該公式也不可行。一個示例是X是一個非常大的稀疏矩陣。該解決方案將太昂貴而無法計算。

Gradient descent is a computationally cheaper (faster) option to find the solution.

梯度下降是找到解決方案的計算便宜(更快)的選擇。

Gradient descent is an optimization algorithm used to minimize some cost function by repetitively moving in the direction of steepest descent. Hence, the model weights are updated after each epoch.

梯度下降是一種優化算法，用于通過在最陡下降方向反復移動來最小化某些成本函數。因此，在每個時期之后更新模型權重。

Basic visualization of gradient descent — ideally gradient descent tries to converge toward global minimum梯度下降的基本可視化-理想情況下，梯度下降試圖向全局最小值收斂

There are three primary types of gradient descent used in machine learning algorithm;

機器學習算法中使用了三種主要的梯度下降類型：

Batch gradient descent

批次梯度下降

Stochastic gradient descent

隨機梯度下降

Mini-batch gradient descent

小批量梯度下降

Let us go through each type in more detail and implementation.

讓我們更詳細地介紹每種類型和實現。

批次梯度下降 (Batch Gradient Descent)

This approach is the most straightforward. It calculates the error for each observation within the training set. It will update the model parameters after all training observations are evaluated. This process can be called a training epoch.

這種方法是最簡單的。它為訓練集中的每個觀測值計算誤差。 在評估所有訓練觀察結果后，它將更新模型參數 。這個過程可以稱為訓練時期。

The main advantages of this approach are computationally efficient and producing a stable error gradient and stable convergence, however, it requires the entire training set in memory, also the stable error gradient can sometimes result in the not-the-best model (converge to local minimum trap, instead attempt to find the best global minimum).

這種方法的主要優點是計算效率高，并產生穩定的誤差梯度和穩定的收斂性，但是，它需要將整個訓練集存儲在內存中，而且穩定的誤差梯度有時會導致生成非最佳模型(收斂到局部最小陷阱，而是嘗試找到最佳的全局最小值)。

Let’s observe the python implementation of the regression problem.

讓我們觀察一下回歸問題的python實現。

The cost function of linear regression線性回歸的成本函數 Batch gradient descent — cost and MSE per epoch批次梯度下降-每個時期的成本和MSE

As we can see, the cost is reducing stably and reach a minimum of around 150–200 epochs.

正如我們所看到的，成本正在穩定下降，并至少達到150-200個時代。

During the computation, we also use vectorization for better performance. However, if the training set is very large, the performance will be slower.

在計算過程中，我們還使用矢量化以獲得更好的性能。但是，如果訓練集很大，則性能會變慢。

隨機梯度下降 (Stochastic Gradient Descent)

In stochastic gradient descent, SGD (or sometimes referred to as iterative or online GD). The name “stochastic” and “online GD” come from the fact that the gradient-based on single training observation is a “stochastic approximation” of the true cost gradient. However, because of this, the path towards the global cost minimum is not direct and may go up-and-down before converging to the global cost minimum.

在隨機梯度下降中，SGD(或有時稱為迭代或在線 GD)。名稱“隨機”和“在線GD”來自以下事實：基于單一訓練觀察的梯度是真實成本梯度的“隨機近似”。但是，由于這個原因，通往全球最低成本的道路并不直接，可能會在達到全球最低成本之前起伏不定。

Hence;

因此；

This makes SGD faster than batch GD (in most cases).
這使得SGD比批處理GD更快(在大多數情況下)。
We can view the insight and rate of improvement of the model in real-time.
我們可以實時查看模型的洞察力和改進率。
Increased model update frequency can result in faster learning.
增加模型更新頻率可以加快學習速度。
Noisy update of stochastic nature can help to avoid the local minimum.
隨機性的噪聲更新可以幫助避免局部最小值。

However, some disadvantages are;

但是，有一些缺點。

Due to update frequency, this can be more computation expensive, which can take a longer time to complete than another approach.
由于更新頻率的原因，與其他方法相比，這可能會增加計算開銷，并且可能需要更長的時間才能完成。
The frequent updates will result in a noisy gradient signal, which causes the model parameters and error to jump around, higher variance over training epochs.
頻繁的更新將導致嘈雜的梯度信號，這會導致模型參數和誤差跳來跳去，在訓練時期上變化更大。

Let’s look at how we can implement this in Python.

讓我們看一下如何在Python中實現它。

Stochastic gradient descent — performance per epoch隨機梯度下降-每個時期的表現

小批量梯度下降 (Mini-Batch Gradient Descent)

Mini-batch gradient descent (MB-GD) is a more preferred method since it compromises between Batch gradient descent and stochastic gradient descent. It separates the training set into small batches and feeds to the algorithm. The model will get updates based on these batches. The model will converge more quickly than batch GD because the weights get updated more frequently.

小批量梯度下降(MB-GD)是一種更優選的方法，因為它會在批量梯度下降和隨機梯度下降之間進行折衷。它將訓練集分成小批，并饋給算法。該模型將基于這些批次獲得更新。該模型將比批處理GD更快收斂，因為權重得到更頻繁的更新。

This method combines the efficiency of batch GD and the robustness of stochastic GD. One (small) downside is this method introduces a new parameter “batch size”, which may require the fine-tuning as part of model tuning/optimization.

該方法結合了批處理GD的效率和隨機GD的魯棒性。一個(小的)缺點是該方法引入了一個新的參數“批大小”，這可能需要進行微調，作為模型調整/優化的一部分。

We can imagine batch size as a slider on the learning process.

我們可以想象批量大小是學習過程中的滑塊。

Small value gives a learning process converges quickly at the cost of noise in the training process
較小的價值使學習過程Swift收斂，但以訓練過程中的噪音為代價
Large value gives a learning process converges slowly with accurate estimates of the error gradient
較大的值可讓學習過程緩慢收斂，并準確估計誤差梯度

We can reuse the above function but need to specify the batch size to be len(training set) > batch_size > 1.

我們可以重用上面的函數，但是需要將批處理大小指定為len(訓練集)> batch_size> 1。

theta, _, mse_ = _sgd_regressor(X_, y, learning_rate=learning_rate, n_epochs=n_epochs, batch_size=50)

Mini-batch gradient descent — performance over an epoch小批量梯度下降—歷時性能

We can see that only the first few epoch, the model is able to converge immediately.

我們可以看到，只有前幾個時期，模型才能夠立即收斂。

SGD回歸器(scikit-learn) (SGD Regressor (scikit-learn))

In python, we can implement a gradient descent approach on regression problem by using sklearn.linear_model.SGDRegressor . Please refer to the documentation for more details.

在python中，我們可以使用sklearn.linear_model.SGDRegressor在回歸問題上實現梯度下降方法。請參閱文檔以獲取更多詳細信息。

Below is how we can implement a stochastic and mini-batch gradient descent method.

以下是我們如何實現隨機和小批量梯度下降方法。

scikit-learn SGD model detail and performancescikit-learn SGD模型的詳細信息和性能 scikit-learn SGD mini-batch model detail and performancescikit-learn SGD微型批次模型的詳細信息和性能

尾注 (EndNote)

In this post, I have provided the explanation of linear regression on both closed-form equation and optimization algorithm, gradient descent, by implementing them from scratch and using a built-in library.

在這篇文章中，我通過從頭開始實現并使用內置庫，對閉式方程式和優化算法(梯度下降)進行了線性回歸的解釋。

其他閱讀和Github存儲庫： (Additional reading and Github repository:)

翻譯自: https://medium.com/towards-artificial-intelligence/closed-form-and-gradient-descent-regression-explained-with-python-1627c9eeb60e