Cost Function
首先本人一直有一個(gè)疑問纏繞了我很久,就是吳恩達(dá)老師所講的機(jī)器學(xué)習(xí)課程里邊的邏輯回歸這點(diǎn),使用的是交叉熵?fù)p失函數(shù),但是在進(jìn)行求導(dǎo)推導(dǎo)時(shí),google了很多的課件以及教程都是直接使用的,這個(gè)問題困擾了很久,最后了解了在國外的教程中都是默認(rèn)log就是ln。所以在機(jī)器學(xué)習(xí)中見到log就在腦海中自動(dòng)轉(zhuǎn)變一下思想他這里說的是ln就這樣去理解吧。
How to fit the parameters theta for logistic regression,In particular, I'd like to define the optimization objective or the cost function that we'll use to fit the parameters.
Here's to supervised learning problem of fitting a logistic regression model.
是n+1維的特征向量。是hypothesis,and the parameters of the hypothesis is this??over here.
Because this is a classification problem,our training set has the property that every label y is either 0 or 1.
?Back when we were developing the linear regression model,we use the followinbg cost function.
???Now,this cost function function worked fine for linear regression,but here we're interested in logistic regression.
?for?logistic regression,We're using a multipower hypothetical function,this would be a non-convex function of the parameters theta.Here is what I mean by non-convex.We have some cost function J of theta(J(θ)), and for logistic regression this function H here has a non linearity,right?因?yàn)檫@里是sigmoid函數(shù),so it's a pretty complicated nonlinear function.And if you take the sigmoid function and plug it in here.J(θ)looks like:
J(θ) can look like a unction just like this, with many local optima and the formal term for this is that this a nion convex function. If you were to turn gradient descent on this sort of function,it is not guaranteed to converge to the global minimum.whereas in contrast,what we would like is to have a cost function J of theta that is convex, that is a single bow-shaped function that looks like this, so that if you run gradient descent ,we would be guaranteed the gradient descent would converge to the global minimun.And the problem of using the squared cost function is that because of this very non linear sigmoid function that appear in the middle here,J of theta ends up being a non convex function if you were to define it sa the squared cost function.So what wewould? like to do is to instead come up with a different cost function that is convex and so that we can apply a great algorithm like gradient descent and be guaranteed to find a global minimum.
------------------------------邏輯回歸------------------
Here is a cost function that we're going to use for logistic regression.?
?
We are going to the cost or the penalty that the algorithm pays.
When y=1,the curve looks like this:
?Now,this cost function has a few interesting and desirable properties,First you notice that:
預(yù)測對了cost就是0,沒有預(yù)測對cost就是1.
First you notice that if y is equal to 1,and h(x)=1,in other words,if the hypothesis exactly predicts h(x) equals 1, and y is exactly equal to what I predicted, Then the cost is equal 0,right?First notice that if h(x)=1,and if the hypothesis predicts that? y is equal?to 1,and if indeed y is equal to 1,then the cost is equal to 0(the case that y equals 1 here).
But if h(x) is equal to 1,the cost is down here is equal to 0,and that's what we like it to be.Because if we correctly predict,the output?y then the cost is 0。
But now notice also that as h(x) approaches ,the output of the hypoyhesis approaches 0,the cost blows up,and it goes to infinity,And what this does is it caputres the intuition.(如果假設(shè)函數(shù)輸出0,相當(dāng)于說我們的假設(shè)函數(shù)Y=1的概率等于0,這類似于我們對病人說你有一個(gè)惡性腫瘤的概率y=1的概率是0,就是說你的惡性腫瘤完全不可能是惡性,但是如果結(jié)果這個(gè)病人的腫瘤的確是惡性的即y=1,雖然我們告訴他,它發(fā)生的概率是0,他完全不可能是惡性的,如果我們這樣確定無疑的告訴他我們的預(yù)測,結(jié)果卻發(fā)現(xiàn)我們是錯(cuò)的,那么我們用非常非常大的代價(jià)值乘法這個(gè)學(xué)習(xí)算法,她是被這樣呈現(xiàn)的,這個(gè)代價(jià)值區(qū)域無窮。如果y=1,但是h(x)=0)
------------------上述是y=1的情況
下面是y=0時(shí)代價(jià)函數(shù)是什么樣的?
If y turns out to be equal to 0,But we predicted y is equal to 1 with almost certainty with probability 1.Then we end up paying a very large cost.
相反,如果h(x)=0 and y=0,then the hypothesis nailed it(那么假設(shè)函數(shù)預(yù)測對了).?The predicted y is equal to 0,and? it turns out y is equal to 0,so at this point the cost function is going to be 0(就是上圖中的原點(diǎn)).
-------------------------上述定義了單訓(xùn)練樣本的代價(jià)函數(shù),我們所選的代價(jià)函數(shù)會(huì)給我們一個(gè)凸優(yōu)化問題,整體的代價(jià)函數(shù)j(θ)將會(huì)是一個(gè)convex and local optima free凸函數(shù)和局部最優(yōu)。
the cost functionfor a single training example to develop further and define??the cost functionfor the entire training set。
Those are examples of more sophisticated optimization algorithms,that need a way to compute J of θ,and need a way to computer the derivatives,and can then use more sophisticated strategies than gradient descent to minimize the cost function.
The details of exactly what these three alhorithms do is well beyond the scope of this course.
--------how to get logistic regression to work for mutil-class classification problrms.---an algorithm called one-versus-all classfication.
What's a multi-class classfication problem.
??
邏輯回歸的損失函數(shù)推導(dǎo)以及損失函數(shù)的導(dǎo)數(shù)推導(dǎo)過程見:
記住幾個(gè)公式:
?
?
有了上述幾個(gè)公式以及性質(zhì)的理解就可以求導(dǎo)了。?
開頭已經(jīng)說過默認(rèn)log是ln所以下邊推導(dǎo)就沒有疑點(diǎn)了。
總結(jié)
以上是生活随笔為你收集整理的Cost Function的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Don‘t entangle those
- 下一篇: 咕泡学院:(1)唐宇迪python课程作