當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

week_3

發(fā)布時(shí)間：2024/4/15 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 week_3 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Andrew Ng機(jī)器學(xué)習(xí)筆記

Week_3 -- -Logistic Regression

This week, we’ll be covering logistic regression. Logistic regression is a method for classifying data into discrete outcomes. In this module, we introduce the notion of classification, the cost function for logistic regression, and the application of logistic regression to multi-class classification.

We’ll introduce regularization, which helps prevent models from over-fitting the training data.

1.Classification

例子：腫瘤與否，垃圾郵件與否

一般可以通過Threshold,閾值，來辨別
例如：０和１的話，可以選擇0.5作為閾值
和回歸問題差不多，不過這個(gè)數(shù)據(jù)是離散數(shù)據(jù)
For Now,we will focus on the Binary Classification Problem
in which you can take on only two values, 0 and 1. (Most of what we say here will also generalize to the multiple-class case.)

例如：識(shí)別垃圾郵件的時(shí)候，我們用$x^{(i)}$　來表示垃圾郵件的特征量feature, Y則只有２個(gè)值，１和０，表示是垃圾郵件和非垃圾郵件。

2.Hypothesis Representation假設(shè)函

討論邏輯函數(shù)的假設(shè)函數(shù)----邏輯回歸

首先我們需要的假設(shè)函數(shù)應(yīng)該預(yù)測值在０到１之間
模仿線性回歸的形式：　
\[h_\theta(x) = g(\theta^T X)\]
其中定義g(Z) = $\frac1{1+e^{-z}}$

即是：$h_\theta(x) = \frac1{1+e^{\theta^T x}}$
如下圖：

這可以理解為一個(gè)概率函數(shù)
可以寫為　　　$h_\theta(x) = {(y = 1 | x,\theta)}$

概率參數(shù)為$\theta$和ｘ,y只有１或者０　。值就是有腫瘤的概率(1)

3. Decision Boundary 決策界限

如上圖所示：0.5即為決策界限
那么如何算出　決策界限呢？

上面是線性的，那么，非線性擬合呢？

那就要通過增加參數(shù)\theta

　5. 優(yōu)化目標(biāo)，代價(jià)函數(shù)cost function

目標(biāo)：如何擬合$\theta$??

和線性回歸的代價(jià)函數(shù)相似：

定義：$J(\theta) = \frac1m\sum_{i=1}^mCost(h_\theta(x),y)$

其中Cost 函數(shù)可以等于　$\frac12(h_\theta(x) - y)^2$

但是，在classification 中，h（x）是非線性的,所以，h(x)圖像可能為：

所以，根據(jù)ｈ（ｘ）的定義
可以定義cost函數(shù)

它的圖像為：

J(\theta)將會(huì)是一個(gè)凸函數(shù)且沒有局部最優(yōu)

6.更簡單的方式找到代價(jià)函數(shù).用梯度下降來擬合邏輯回歸的參數(shù)$\theta$.Simplified Cost Function and Gradient Descent

代價(jià)函數(shù)可以不用分段函數(shù)：還可以用
$Cost(h_\theta(x),y) = -yln(1 - h_\theta(x)) - (1-y)ln(1-h_\theta(x))$

來表示，這樣就不用分段了

所以，代價(jià)函數(shù)就是：

或者，用向量來表示：

類似，我們要找到Ｊ（$\theta $）的最小值

當(dāng)然，使用Gradient Descent梯度下降

如圖操作，注意此時(shí)ｈ函數(shù)不是和線性回歸的是一樣的

接下來，如何監(jiān)測梯度下降？

當(dāng)然，也可以用向量方法來實(shí)現(xiàn)：？？？？？（如何推導(dǎo)？？）

看這里更清楚：（截圖功能真的是太贊了！）

7.高級(jí)優(yōu)化Advanced Optimization

算法優(yōu)化，Optimization algorithms

Gradient descent
Conjugate gradient
BFGS
L-BFGS 共軛梯度算法

后三種算法的優(yōu)點(diǎn)：

No need to manually pick $\alpha$
Often faster than gradient descent

缺點(diǎn)：

More complex

代碼細(xì)節(jié)不用知道！

可以以后學(xué)習(xí)tensorflow 來實(shí)現(xiàn)

8. Multiclass classification

例如，把郵件貼上不同的標(biāo)簽

那么，如何找到多元分類的決策界限？

例如，3種的話，通過兩兩分類

一對(duì)多方法。分為3個(gè)二元問題

上圖擬合出3個(gè)分類器

$h_\theta^{(i)}(x) = P(y = i| x,\theta), (i = 1,2,3)$

9. 過度擬合的問題over-fitting

變量太多，無法通過更多的數(shù)據(jù)來進(jìn)行約束
以至于無法泛化到新數(shù)據(jù)中

線性：

第一個(gè)是under fitting ,欠擬合,有 high bias

第二個(gè)just right

第三個(gè)則是over fitting

邏輯回歸：

如何解決？

減少變量數(shù)目 reduce number of features
Regularization正則化

10. Regularization and its Cost function 正則化及其代價(jià)函數(shù)

像上節(jié)課一樣，當(dāng)過度擬合的時(shí)候，我們可以讓其他的參數(shù)的影響盡可能的小

penalize懲罰$\theta_3和\theta_4$這兩個(gè)參數(shù)。使他們盡可能為0

如何操作？

在最初的cost function中添加正則化項(xiàng)

$\lambda$ 叫做正則化參數(shù)

$\lambda $太大的話，會(huì)under fitting
所以應(yīng)該選擇合適的正則化參數(shù)

一張圖來解釋：

11. 正則線性回歸Regularized Linear Regression

算法：

Gradient descent

可以等價(jià)的寫為：

$\theta_j := \theta_j(1- \alpha\frac1m) - \alpha\frac1m\sum_{i=1}^m(h_\theta(x^{(i)}_j)-y^{(i)}) x_j^{(i)}$
其中$1-\alpha\frac1m$這一項(xiàng)，如果學(xué)習(xí)率小，例子數(shù)量大的化，一般是比1小于一點(diǎn)點(diǎn)的值
而后面這一大坨，則和以前一模一樣！！！

只不過前面這一項(xiàng)把theta壓縮了！

Normal equation

使用了正則化，如何得到矩陣式子？

數(shù)學(xué)推導(dǎo)略！

或者寫成：

12. Regularized Logistic Regression 邏輯回歸的正則化

和線性回歸差不多，要添加正則項(xiàng)

算法類似，都要將0單獨(dú)寫出

下面來說明如何在更高級(jí)的算法中，應(yīng)用正則化：

（學(xué)完octave后，應(yīng)該就能看懂）

綜上所述：

題目摘錄：

第 3 題
Which of the following statements about regularization are true? Check all that apply.

Using too large a value of λ can cause your hypothesis to overfit the data; this can be avoided by reducing λ.
Using a very large value of λ cannot hurt the performance of your hypothesis; the only reason we do not set λ to be too large is to avoid numerical problems.
Consider a classification problem. Adding regularization may cause your classifier to incorrectly classify some training examples (which it had correctly classified when not using regularization, i.e. when λ=0).
Because logistic regression outputs values 0≤hθ(x)≤1, its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.

答案: 3 * 正則化方法的公式: J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθ2j]J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθj2]
選項(xiàng)1: λλ太大導(dǎo)致overfit不對(duì),是underfit,當(dāng)λλ太大時(shí)θ1θ2...θn≈0θ1θ2...θn≈0.只有θ0θ0起作用,擬合出來是一條直線. λλ太小才會(huì)導(dǎo)致overfit. 不正確 **
選項(xiàng)2: 同1. 不正確 **
選項(xiàng)3: 當(dāng)λλ沒有選擇好時(shí),可能會(huì)導(dǎo)致訓(xùn)練效果還不如不加的λλ好. 正確 **
選項(xiàng)4: “shrunk” slightly的是θθ, regularization是想要解決overfit. 不正確！

第 1 題
You are training a classification model with logistic
regression. Which of the following statements are true? Check
all that apply.

Introducing regularization to the model always results in equal or better performance on the training set.
Adding many new features to the model helps prevent overfitting ont the training set.
Introducing regularization to the model always results in equal or better performance on examples not in the training set.
Adding a new feature to the model always results in equal or better performance on the training set.

答案: 4 * 正則化方法的公式: J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθ2j]J(θ)=12m[∑i=1m(hθ(x(i))?y(i))2+λ∑i=1nθj2]
選項(xiàng)1: 將正則化方法加入模型并不是每次都能取得好的效果,如果λλ取得太大的化就會(huì)導(dǎo)致欠擬合. 這樣不論對(duì)traing set 還是 examples都不好. 不正確 **
選項(xiàng)2: more features能夠更好的fit 訓(xùn)練集,同時(shí)也容易導(dǎo)致overfit,是more likely而不是prevent. 不正確 **
選項(xiàng)3: 同1,將正則化方法加入模型并不是每次都能取得好的效果,如果λλ取得太大的化就會(huì)導(dǎo)致欠擬合. 這樣不論對(duì)traing set 還是 examples都不好. 不正確 **
選項(xiàng)4: 新加的feature會(huì)提高train set的擬合度,而不是example擬合度. 正確 *[]

轉(zhuǎn)載于:https://www.cnblogs.com/orangestar/p/11178192.html

總結(jié)

以上是生活随笔為你收集整理的week_3的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

编程问答

week_3

Andrew Ng機(jī)器學(xué)習(xí)筆記

Week_3 -- -Logistic Regression

1.Classification

2.Hypothesis Representation假設(shè)函

3. Decision Boundary 決策界限

5. 優(yōu)化目標(biāo)，代價(jià)函數(shù)cost function

6.更簡單的方式找到代價(jià)函數(shù).用梯度下降來擬合邏輯回歸的參數(shù)\(\theta\).Simplified Cost Function and Gradient Descent

7.高級(jí)優(yōu)化Advanced Optimization

8. Multiclass classification

9. 過度擬合的問題over-fitting

10. Regularization and its Cost function 正則化及其代價(jià)函數(shù)

11. 正則線性回歸Regularized Linear Regression

12. Regularized Logistic Regression 邏輯回歸的正則化

總結(jié)

　5. 優(yōu)化目標(biāo)，代價(jià)函數(shù)cost function