當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

第三讲-------Logistic Regression Regularization

發(fā)布時間：2025/3/21 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了第三讲-------Logistic Regression Regularization 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

第三講-------Logistic Regression & Regularization

本講內容：

Logistic Regression

=========================

(一)、Classification

（二）、Hypothesis Representation

（三）、Decision Boundary

（四）、Cost Function

（五）、Simplified Cost Function and Gradient Descent

（六）、Parameter Optimization in Matlab

（七）、Multiclass classification : One-vs-all

The problem of overfitting and how to solve it

=========================

（八）、The problem of overfitting

（九）、Cost Function

（十）、Regularized Linear Regression

（十一）、Regularized Logistic Regression

本章主要講述邏輯回歸和Regularization解決過擬合的問題，非常非常重要，是機器學習中非常常用的回歸工具，下面分別進行兩部分的講解。

第一部分：Logistic Regression

/*************（一）~（二）、Classification /?Hypothesis Representation***********/

假設隨Tumor Size變化，預測病人的腫瘤是惡性（malignant）還是良性（benign）的情況。

給出8個數據如下：

? ?

假設進行l(wèi)inear regression得到的hypothesis線性方程如上圖中粉線所示，則可以確定一個threshold:0.5進行predict

y=1, if h(x)>=0.5

y=0, if ?h(x)<0.5

即malignant=0.5的點投影下來，其右邊的點預測y=1;左邊預測y=0；則能夠很好地進行分類。

那么，如果數據集是這樣的呢？

這種情況下，假設linear regression預測為藍線，那么由0.5的boundary得到的線性方程中，不能很好地進行分類。因為不滿足

y=1, h(x)>0.5

y=0, h(x)<=0.5

這時，我們引入logistic regression model：

所謂Sigmoid function或Logistic function就是這樣一個函數g(z)見上圖所示

當z>=0時，g(z)>=0.5；當z<0時，g(z)<0.5

由下圖中公式知，給定了數據x和參數θ，y=0和y=1的概率和=1

/*****************************（三）、decision boundary**************************/

所謂Decision Boundary就是能夠將所有數據點進行很好地分類的h(x)邊界。

如下圖所示，假設形如h(x)=g(θ0+θ1x1+θ2x2)的hypothesis參數θ=[-3,1,1]T, 則有

predict Y=1, if -3+x1+x2>=0

predict Y=0, if -3+x1+x2<0

剛好能夠將圖中所示數據集進行很好地分類

Another Example:

answer:

除了線性boundary還有非線性decision boundaries，比如

下圖中，進行分類的decision boundary就是一個半徑為1的圓，如圖所示：

/********************（四）~（五）Simplified cost function and gradient descent<非常重要>*******************/

該部分講述簡化的logistic regression系統中how to implement gradient descents for logistic regression.

假設我們的數據點中y只會取0和1, 對于一個logistic regression model系統，有，那么cost function定義如下：

由于y只會取0,1，那么就可以寫成

不信的話可以把y=0,y=1分別代入，可以發(fā)現這個J（θ）和上面的Cost(hθ(x),y)是一樣的(*^__^*) ，那么剩下的工作就是求能最小化 J(θ)的θ了~

在第一章中我們已經講了如何應用Gradient Descent, 也就是下圖Repeat中的部分，將θ中所有維同時進行更新，而J(θ)的導數可以由下面的式子求得，結果如下圖手寫所示：

現在將其帶入Repeat中：

這是我們驚奇的發(fā)現，它和第一章中我們得到的公式是一樣滴~

也就是說，下圖中所示，不管h(x)的表達式是線性的還是logistic regression model, 都能得到如下的參數更新過程。

那么如何用vectorization來做呢？換言之，我們不要用for循環(huán)一個個更新θj，而用一個矩陣乘法同時更新整個θ。也就是解決下面這個問題：

上面的公式給出了參數矩陣θ的更新，那么下面再問個問題，第二講中說了如何判斷學習率α大小是否合適，那么在logistic regression系統中怎么評判呢？

Q：Suppose you are running gradient descent to fit a logistic regression model with parameter?θ∈Rn+1. Which of the following is a reasonable way to make sure the learning rate?α?is set properly and that gradient descent is running correctly?

A：

/*************（六）、Parameter Optimization in Matlab***********/

這部分內容將對logistic regression 做一些優(yōu)化措施，使得能夠更快地進行參數梯度下降。本段實現了matlab下用梯度方法計算最優(yōu)參數的過程。

首先聲明，除了gradient descent 方法之外，我們還有很多方法可以使用，如下圖所示，左邊是另外三種方法，右邊是這三種方法共同的優(yōu)缺點，無需選擇學習率α，更快，但是更復雜。

也就是matlab中已經幫我們實現好了一些優(yōu)化參數θ的方法，那么這里我們需要完成的事情只是寫好cost function,并告訴系統，要用哪個方法進行最優(yōu)化參數。比如我們用‘GradObj’，?Use the GradObj option to specify?that FUN also returns a second output argument G that is the partial?derivatives of the function df/dX, at the point X.

如上圖所示，給定了參數θ，我們需要給出cost Function. 其中，

jVal 是 cost function 的表示，比如設有兩個點（1,0,5）和（0,1,5）進行回歸，那么就設方程為hθ(x)=θ1x1+θ2x2;
則有costfunction J(θ)： jVal=(theta(1)-5)^2+(theta(2)-5)^2;

在每次迭代中，按照gradient descent的方法更新參數θ：θ(i)-=gradient(i),其中gradient(i)是J(θ)對θi求導的函數式，在此例中就有gradient(1)=2*(theta(1)-5),?gradient(2)=2*(theta(2)-5)。如下面代碼所示：

函數costFunction, 定義jVal=J(θ)和對兩個θ的gradient：

[cpp] view plaincopyprint?

function?[?jVal,gradient?]?=?costFunction(?theta?)??

%COSTFUNCTION?Summary?of?this?function?goes?here??

%???Detailed?explanation?goes?here??

jVal=?(theta(1)-5)^2+(theta(2)-5)^2;??

gradient?=?zeros(2,1);??

%code?to?compute?derivative?to?theta??

gradient(1)?=?2?*?(theta(1)-5);??

gradient(2)?=?2?*?(theta(2)-5);??

end??

function [ jVal,gradient ] = costFunction( theta ) %COSTFUNCTION Summary of this function goes here % Detailed explanation goes herejVal= (theta(1)-5)^2+(theta(2)-5)^2;gradient = zeros(2,1); %code to compute derivative to theta gradient(1) = 2 * (theta(1)-5); gradient(2) = 2 * (theta(2)-5);end

編寫函數Gradient_descent，進行參數優(yōu)化

[cpp] view plaincopyprint?

function?[optTheta,functionVal,exitFlag]=Gradient_descent(?)??

%GRADIENT_DESCENT?Summary?of?this?function?goes?here??

%???Detailed?explanation?goes?here??

?options?=?optimset('GradObj','on','MaxIter',100);??

?initialTheta?=?zeros(2,1)??

?[optTheta,functionVal,exitFlag]?=?fminunc(@costFunction,initialTheta,options);??

????

end??

function [optTheta,functionVal,exitFlag]=Gradient_descent( ) %GRADIENT_DESCENT Summary of this function goes here % Detailed explanation goes hereoptions = optimset('GradObj','on','MaxIter',100);initialTheta = zeros(2,1)[optTheta,functionVal,exitFlag] = fminunc(@costFunction,initialTheta,options);end

matlab主窗口中調用，得到優(yōu)化厚的參數(θ1,θ2)=(5,5),即hθ(x)=θ1x1+θ2x2=5*x1+5*x2

[cpp] view plaincopyprint?

?[optTheta,functionVal,exitFlag]?=?Gradient_descent()??

initialTheta?=??

?????0??

Local?minimum?found.??

Optimization?completed?because?the?size?of?the?gradient?is?less?than??

the?default?value?of?the?function?tolerance.??

<stopping?criteria?details>??

optTheta?=??

?????5??

functionVal?=??

?????0??

exitFlag?=??

?????1??

[optTheta,functionVal,exitFlag] = Gradient_descent()initialTheta =00Local minimum found.Optimization completed because the size of the gradient is less than the default value of the function tolerance.<stopping criteria details>optTheta =55functionVal =0exitFlag =1

最后得到的結果顯示出優(yōu)化參數optTheta=[5,5], functionVal = costFunction(迭代后) = 0

/*****************************（七）、Multi-class Classification One-vs-all**************************/

所謂one-vs-all method就是將binary分類的方法應用到多類分類中。

比如我想分成K類，那么就將其中一類作為positive，另（k-1）合起來作為negative，這樣進行K個h(θ)的參數優(yōu)化，每次得到的一個hθ(x)是指給定θ和x，它屬于positive的類的概率。

按照上面這種方法，給定一個輸入向量x，獲得最大hθ(x)的類就是x所分到的類。

第二部分：The problem of overfitting and how to solve it

/************（八）、The problem of overfitting***********/

The Problem of overfitting:

overfitting就是過擬合，如下圖中最右邊的那幅圖。對于以上講述的兩類（logistic regression和linear regression）都有overfitting的問題，下面分別用兩幅圖進行解釋：

<Linear Regression>:

<logistic regression>:

怎樣解決過擬合問題呢？兩個方法：

1. 減少feature個數（人工定義留多少個feature、算法選取這些feature）

2. 規(guī)格化（留下所有的feature，但對于部分feature定義其parameter非常小）

下面我們將對regularization進行詳細的講解。

對于linear regression model, 我們的問題是最小化

寫作矩陣表示即

i.e. the loss function can be written as

there we can get:

After regularization, however,we have:

/************（九）、Cost Function***********/
對于Regularization，方法如下，定義cost function中θ3，θ4的parameter非常大，那么最小化cost function后就有非常小的θ3,θ4了。

寫作公式如下，在cost function中加入θ1~θn的懲罰項：

這里要注意λ的設置，見下面這個題目：

? ? A:λ很大會導致所有θ≈0

下面呢，我們分linear regression 和 logistic regression分別進行regularization步驟.

/************（十）、Regularized Linear Regression***********/

<Linear regression>:

首先看一下，按照上面的cost function的公式，如何應用gradient descent進行參數更新。

對于θ0，沒有懲罰項，更新公式跟原來一樣

對于其他θj，J(θ)對其求導后還要加上一項(λ/m)*θj，見下圖：

如果不使用梯度下降法（gradient descent+regularization），而是用矩陣計算（normal equation）來求θ，也就求使J(θ)min的θ，令J(θ)對θj求導的所有導數等于0，有公式如下：

而且已經證明，上面公式中括號內的東西是可逆的。

/************（十一）、Regularized Logistic Regression***********/

<Logistic regression>:

前面已經講過Logisitic Regression的cost function和overfitting的情況，如下圖中所示:

和linear regression一樣，我們給J(θ)加入關于θ的懲罰項來抑制過擬合：

用Gradient Descent的方法，令J(θ)對θj求導都等于0，得到

這里我們發(fā)現，其實和線性回歸的θ更新方法是一樣的。

When using regularized logistic regression, which of these is the best way to monitor whether gradient descent is working correctly?

和上面matlab中調用那個例子相似，我們可以定義logistic regression的cost function如下所示：

圖中，jval表示cost function 表達式，其中最后一項是參數θ的懲罰項；下面是對各θj求導的梯度，其中θ0沒有在懲罰項中，因此gradient不變，θ1~θn分別多了一項(λ/m)*θj；

至此，regularization可以解決linear和logistic的overfitting regression問題了~

《新程序員》：云原生和全面數字化實踐50位技術專家共同創(chuàng)作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的第三讲-------Logistic Regression Regularization的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Linear Regression总结2
下一篇： Coursera公开课笔记: 斯坦福大学

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

第三讲-------Logistic Regression Regularization

總結