日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》

發(fā)布時(shí)間:2025/3/21 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》

?

?

目錄

原文解讀

文章內(nèi)容以及劃重點(diǎn)

結(jié)論


?

?

?

原文解讀

原文:Understanding the difficulty of training deep feedforward neural networks

?

文章內(nèi)容以及劃重點(diǎn)

Sigmoid的四層局限


sigmoid函數(shù)的test loss和training loss要經(jīng)過很多輪數(shù)一直為0.5,后再有到0.1的差強(qiáng)人意的變化。

?

? ? ?We hypothesize that this behavior is due to the combinationof random initialization and the fact that an hidden unitoutput of 0 corresponds to a saturated sigmoid. Note that deep networks with sigmoids but initialized from unsupervisedpre-training (e.g. from RBMs) do not suffer fromthis saturation behavior.

?

tanh、softsign的五層局限



換為tanh函數(shù),就會(huì)很好很快的收斂

?

結(jié)論

1、The normalization factor may therefore be important when initializing deep networks because of the multiplicative effect through layers, and we suggest the following initialization procedure to approximately satisfy our objectives of maintaining activation variances and back-propagated gradients variance as one moves up or down the network. We call it the normalized initialization


2、結(jié)果可知分布更加均勻

? ? ?Activation values normalized histograms with ?hyperbolic tangent activation, with standard (top) vs normalized ?initialization (bottom). Top: 0-peak increases for ?higher layers.
? ? ? ?Several conclusions can be drawn from these error curves: ?
(1)、The more classical neural networks with sigmoid or ?hyperbolic tangent units and standard initialization ?fare rather poorly, converging more slowly and apparently ?towards ultimately poorer local minima.?
(2)、The softsign networks seem to be more robust to the ?initialization procedure than the tanh networks, presumably ?because of their gentler non-linearity.?
(3)、For tanh networks, the proposed normalized initialization ?can be quite helpful, presumably because the ?layer-to-layer transformations maintain magnitudes of activations (flowing upward) and gradients (flowing backward).
3、Sigmoid 5代表有5層,N代表正則化,可得出預(yù)訓(xùn)練會(huì)得到更小的誤差




相關(guān)文章
Understanding the difficulty of training deep feedforward neural networks 本文作者為:Xavier Glorot與Yoshua Bengio。

總結(jié)

以上是生活随笔為你收集整理的Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。