week_3
Andrew Ng機(jī)器學(xué)習(xí)筆記
Week_3 -- -Logistic Regression
This week, we’ll be covering logistic regression. Logistic regression is a method for classifying data into discrete outcomes. In this module, we introduce the notion of classification, the cost function for logistic regression, and the application of logistic regression to multi-class classification.
We’ll introduce regularization, which helps prevent models from over-fitting the training data.
1.Classification
例子:腫瘤與否,垃圾郵件與否
一般可以通過Threshold,閾值,來辨別
例如:0和1的話,可以選擇0.5作為閾值
和回歸問題差不多,不過這個(gè)數(shù)據(jù)是離散數(shù)據(jù)
For Now,we will focus on the Binary Classification Problem
in which you can take on only two values, 0 and 1. (Most of what we say here will also generalize to the multiple-class case.)
例如:識(shí)別垃圾郵件的時(shí)候,我們用\(x^{(i)}\) 來表示垃圾郵件的特征量feature, Y則只有2個(gè)值,1和0,表示是垃圾郵件和非垃圾郵件。
2.Hypothesis Representation假設(shè)函
討論邏輯函數(shù)的假設(shè)函數(shù)----邏輯回歸
首先我們需要的假設(shè)函數(shù)應(yīng)該預(yù)測值在0到1之間
模仿線性回歸的形式:
\[h_\theta(x) = g(\theta^T X)\]
其中定義g(Z) = \(\frac1{1+e^{-z}}\)
即是:\(h_\theta(x) = \frac1{1+e^{\theta^T x}}\)
如下圖:
這可以理解為一個(gè)概率函數(shù)
可以寫為 \(h_\theta(x) = {(y = 1 | x,\theta)}\)
概率參數(shù)為\(\theta\)和x,y只有1或者0 。值就是有腫瘤的概率(1)
3. Decision Boundary 決策界限
如上圖所示:0.5即為決策界限
那么如何算出 決策界限呢?
上面是線性的,那么,非線性擬合呢?
那就要通過增加參數(shù)\theta
5. 優(yōu)化目標(biāo),代價(jià)函數(shù)cost function
目標(biāo):如何擬合\(\theta\)??
和線性回歸的代價(jià)函數(shù)相似:
定義:\(J(\theta) = \frac1m\sum_{i=1}^mCost(h_\theta(x),y)\)
其中Cost 函數(shù)可以等于 \(\frac12(h_\theta(x) - y)^2\)
但是,在classification 中,h(x)是非線性的,所以,h(x)圖像可能為:
所以,根據(jù)h(x)的定義
可以定義cost函數(shù)
它的圖像為:
J(\theta)將會(huì)是一個(gè)凸函數(shù)且沒有局部最優(yōu)
6.更簡單的方式找到代價(jià)函數(shù).用梯度下降來擬合邏輯回歸的參數(shù)\(\theta\).Simplified Cost Function and Gradient Descent
代價(jià)函數(shù)可以不用分段函數(shù):還可以用
\(Cost(h_\theta(x),y) = -yln(1 - h_\theta(x)) - (1-y)ln(1-h_\theta(x))\)
來表示,這樣就不用分段了
所以,代價(jià)函數(shù)就是:
或者,用向量來表示:
類似,我們要找到J($\theta $)的最小值
當(dāng)然,使用Gradient Descent梯度下降
如圖操作,注意此時(shí)h函數(shù)不是和線性回歸的是一樣的
接下來,如何監(jiān)測梯度下降?
當(dāng)然,也可以用向量方法來實(shí)現(xiàn):?????(如何推導(dǎo)??)
看這里更清楚:(截圖功能真的是太贊了!)
7.高級(jí)優(yōu)化Advanced Optimization
算法優(yōu)化,Optimization algorithms
Gradient descent
- Conjugate gradient
- BFGS
L-BFGS 共軛梯度算法
后三種算法的優(yōu)點(diǎn):
- No need to manually pick \(\alpha\)
- Often faster than gradient descent
缺點(diǎn):
- More complex
代碼細(xì)節(jié)不用知道!
可以以后學(xué)習(xí)tensorflow 來實(shí)現(xiàn)
8. Multiclass classification
例如,把郵件貼上不同的標(biāo)簽
那么,如何找到多元分類的決策界限?
例如,3種的話,通過兩兩分類
一對(duì)多方法。分為3個(gè)二元問題
上圖擬合出3個(gè)分類器
\(h_\theta^{(i)}(x) = P(y = i| x,\theta), (i = 1,2,3)\)
9. 過度擬合的問題over-fitting
變量太多,無法通過更多的數(shù)據(jù)來進(jìn)行約束
以至于無法泛化到新數(shù)據(jù)中
線性:
第一個(gè)是under fitting ,欠擬合,有 high bias
第二個(gè)just right
第三個(gè)則是over fitting
邏輯回歸:
如何解決?
- 減少變量數(shù)目 reduce number of features
- Regularization正則化
10. Regularization and its Cost function 正則化及其代價(jià)函數(shù)
像上節(jié)課一樣,當(dāng)過度擬合的時(shí)候,我們可以讓其他的參數(shù)的影響盡可能的小
penalize懲罰\(\theta_3和\theta_4\)這兩個(gè)參數(shù)。使他們盡可能為0
如何操作?
在最初的cost function中添加 正則化項(xiàng)
\(\lambda\) 叫做正則化參數(shù)
$\lambda $太大的話,會(huì)under fitting
所以應(yīng)該選擇合適的正則化參數(shù)
一張圖來解釋:
11. 正則線性回歸Regularized Linear Regression
算法:
可以等價(jià)的寫為:
\(\theta_j := \theta_j(1- \alpha\frac1m) - \alpha\frac1m\sum_{i=1}^m(h_\theta(x^{(i)}_j)-y^{(i)}) x_j^{(i)}\)
其中\(1-\alpha\frac1m\)這一項(xiàng),如果學(xué)習(xí)率小,例子數(shù)量大的化,一般是比1小于一點(diǎn)點(diǎn)的值
而后面這一大坨,則和以前一模一樣!!!
只不過前面這一項(xiàng)把theta壓縮了!
Normal equation
使用了正則化,如何得到矩陣式子?
數(shù)學(xué)推導(dǎo)略!
或者寫成:
12. Regularized Logistic Regression 邏輯回歸的正則化
和線性回歸差不多,要添加正則項(xiàng)
算法類似,都要將0單獨(dú)寫出
下面來說明如何在更高級(jí)的算法中,應(yīng)用正則化:
(學(xué)完octave后,應(yīng)該就能看懂)
綜上所述:
題目摘錄:
第 3 題
Which of the following statements about regularization are true? Check all that apply.
Using too large a value of λ can cause your hypothesis to overfit the data; this can be avoided by reducing λ.
Using a very large value of λ cannot hurt the performance of your hypothesis; the only reason we do not set λ to be too large is to avoid numerical problems.
Consider a classification problem. Adding regularization may cause your classifier to incorrectly classify some training examples (which it had correctly classified when not using regularization, i.e. when λ=0).
Because logistic regression outputs values 0≤hθ(x)≤1, its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.
- 答案: 3 * 正則化方法的公式: J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθ2j]J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθj2]
- 選項(xiàng)1: λλ太大導(dǎo)致overfit不對(duì),是underfit,當(dāng)λλ太大時(shí)θ1θ2...θn≈0θ1θ2...θn≈0.只有θ0θ0起作用,擬合出來是一條直線. λλ太小才會(huì)導(dǎo)致overfit. 不正確 **
- 選項(xiàng)2: 同1. 不正確 **
- 選項(xiàng)3: 當(dāng)λλ沒有選擇好時(shí),可能會(huì)導(dǎo)致訓(xùn)練效果還不如不加的λλ好. 正確 **
- 選項(xiàng)4: “shrunk” slightly的是θθ, regularization是想要解決overfit. 不正確 !
第 1 題
You are training a classification model with logistic
regression. Which of the following statements are true? Check
all that apply.
Introducing regularization to the model always results in equal or better performance on the training set.
Adding many new features to the model helps prevent overfitting ont the training set.
Introducing regularization to the model always results in equal or better performance on examples not in the training set.
Adding a new feature to the model always results in equal or better performance on the training set.
- 答案: 4 * 正則化方法的公式: J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθ2j]J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθj2]
- 選項(xiàng)1: 將正則化方法加入模型并不是每次都能取得好的效果,如果λλ取得太大的化就會(huì)導(dǎo)致欠擬合. 這樣不論對(duì)traing set 還是 examples都不好. 不正確 **
- 選項(xiàng)2: more features能夠更好的fit 訓(xùn)練集,同時(shí)也容易導(dǎo)致overfit,是more likely而不是prevent. 不正確 **
- 選項(xiàng)3: 同1,將正則化方法加入模型并不是每次都能取得好的效果,如果λλ取得太大的化就會(huì)導(dǎo)致欠擬合. 這樣不論對(duì)traing set 還是 examples都不好. 不正確 **
- 選項(xiàng)4: 新加的feature會(huì)提高train set的擬合度,而不是example擬合度. 正確 *[]
轉(zhuǎn)載于:https://www.cnblogs.com/orangestar/p/11178192.html
總結(jié)