日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

如何计算一个神经网络在使用momentum时的hessian矩阵(论文调研)

發(fā)布時(shí)間:2023/12/20 编程问答 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 如何计算一个神经网络在使用momentum时的hessian矩阵(论文调研) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

根據(jù)[4]中的說法,“Though results on the Hessian of individual layers were not included in this study”,似乎每個(gè)層都有一個(gè)對應(yīng)的Hessian矩陣。

根據(jù)[5]中的說法,最后一層的hessian矩陣很好計(jì)算,但是如果下一層,那就很不好計(jì)算

下面的這些對hessian矩陣的理論處理可能有幫助,先記載一下:
------------------------------------------
[7]很清晰地講解了分母是否轉(zhuǎn)置對求導(dǎo)結(jié)果的影響,如下:
對于x=(x1…xN)Tx=\left(x_{1} \dots x_{N}\right)^{T}x=(x1?xN?)T

?f(x)?x=(?f(x)?x1?f(x)?x2??f(x)?xN)\frac{\partial f(x)}{\partial x}=\left(\begin{array}{c}{\frac{\partial f(x)}{\partial x_{1}}} \\ {\frac{\partial f(x)}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial f(x)}{\partial x_{N}}}\end{array}\right)?x?f(x)?=????????x1??f(x)??x2??f(x)???xN??f(x)?????????

(?f(x)?x)T=?f(x)?xT=(?f(x)?x1?f(x)?x2…?f(x)?xN)\left(\frac{\partial f(x)}{\partial x}\right)^{T}=\frac{\partial f(x)}{\partial x^{T}}=\left(\frac{\partial f(x)}{\partial x_{1}} \quad \frac{\partial f(x)}{\partial x_{2}} \quad \ldots \quad \frac{\partial f(x)}{\partial x_{N}}\right)(?x?f(x)?)T=?xT?f(x)?=(?x1??f(x)??x2??f(x)??xN??f(x)?)

?2f(x)?x?xT=(?2f(x)?x12?2f(x)?x1?x2??2f(x)?x1?xN?2f(x)?x2?x1?2f(x)?x22??2f(x)?xN?1?x2??2f(x)?xN?x1???2f(x)?xN2)\frac{\partial^{2} f(x)}{\partial x \partial x^{T}}=\left(\begin{array}{cccc}{\frac{\partial^{2} f(x)}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{N}}} \\ {\frac{\partial^{2} f(x)}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f(x)}{\partial x_{2}^{2}}} & {} & {\vdots} \\ {} & {\frac{\partial^{2} f(x)}{\partial x_{N-1}\partial x_{2}}} & {} & {\vdots} \\ {\frac{\partial^{2} f(x)}{\partial x_{N} \partial x_{1}}} & {\cdots} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{N}^{2}}}\end{array}\right)?x?xT?2f(x)?=??????????x12??2f(x)??x2??x1??2f(x)??xN??x1??2f(x)???x1??x2??2f(x)??x22??2f(x)??xN?1??x2??2f(x)???????x1??xN??2f(x)????xN2??2f(x)???????????

------------------------------------------

粘貼工具是:Mathpix Snipping Tool,第一次發(fā)現(xiàn)這工具截圖然后轉(zhuǎn)化不準(zhǔn)的問題,sigh…
------------------------------------------
[1]中的(2.16)~(2.18)無法核實(shí),
(3.1)~(3.3)中出現(xiàn)了奇怪的符號(hào)δ沒有說明是什么含義
(2.8)對于bnib_{ni}bni?的定義很奇怪,[1]中根據(jù)(2.15)與(2.12)的比較,可知該文是在論述二分類目的的神經(jīng)網(wǎng)絡(luò),該文作者無法聯(lián)系上,最終放棄閱讀。

[3]使用彈簧振子在模仿神經(jīng)網(wǎng)絡(luò)的不斷振蕩,分別從微分方程和差分方程兩個(gè)角度來論述為什么momentum這種optimizer能夠加速收斂

聯(lián)系了[4]作者,回復(fù)是需要谷歌的大量設(shè)備以及專門腳本才能復(fù)現(xiàn),并不能在家里實(shí)現(xiàn),連他自己手上都沒有代碼。
------------------------------------------

至于hessian-free的意思指的是,計(jì)算Hv而不是直接計(jì)算H,這樣避開計(jì)算H的龐大工作量。
計(jì)算H?1VH^{-1}VH?1V的目標(biāo)是為了在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時(shí),二階牛頓法的迭代項(xiàng)中有所使用.
------------------------------------------

##################下面幾個(gè)github鏈接和hessian-free相關(guān)####################################
[7]這個(gè)作者不回復(fù)了,棄坑

https://github.com/drasmuss/hessianfree
這個(gè)里面的代碼主要是共軛梯度法,直接舍棄了和Jacobian和Hessian相關(guān)的操作

https://github.com/NithinTangellamudi/HessianFreeImplementation
代碼各種語法錯(cuò)誤,棄坑

☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
下面的還在研究中:
#------------------------------------------------------------------------------------------

[8]中的代碼配合論文[9]:
hessian-free部分的代碼如下:

def gauss_vect_mult(v):"""Multiply a vector by the Gauss-Newton matrix JHJ'where J is the Jacobian between output and params and H is the Hessian between costs and outputH should be diagonal and positive.Also add the ridge"""Jv = T.Rop(output, params, v)HJv = T.Rop(T.grad(opt_cost,output), output, Jv)JHJv = T.Lop(output, params, HJv)if not isinstance(JHJv,list):JHJv = [JHJv]JHJv = [a+ridge*b for a,b in zip(JHJv,v)]return JHJv

給作者發(fā)了郵件詢問理由支持,但是沒有回復(fù)
#------------------------------------------------------------------------------------------
[10]代碼是下面論文[11]的一部分

hessian-free部分的代碼如下:

def gauss_newton_product(cost, p, v, s): # this computes the product Gv = J'HJv (G is the Gauss-Newton matrix)Jv = T.Rop(s, p, v)HJv = T.grad(T.sum(T.grad(cost, s)*Jv), s, consider_constant=[Jv], disconnected_inputs='ignore')Gv = T.grad(T.sum(HJv*s), p, consider_constant=[HJv, Jv], disconnected_inputs='ignore')Gv = map(T.as_tensor_variable, Gv) # for CudaNdarrayreturn Gv

給作者發(fā)了郵件詢問理由支持,但是沒有回復(fù)
#------------------------------------------------------------------------------------------
[12]涉及到元學(xué)習(xí)

Reference:
[1]Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
[2]A fast procedure for re-training the multilayer perceptron
[3]On the Momentum Term in Gradient Descent Learning Algorithms
[4]Negative eigen values of the hessian in deep neural networks
[5]Most efficient way to calculate hessian of cost function in neural network
[6]https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470173862.app4
[7]https://github.com/moonl1ght/HessianFreeOptimization/issues/1
[8]https://github.com/doomie/HessianFree
[9]Improved Preconditioner for Hessian Free Optimization
[10]https://github.com/boulanni/theano-hf
[11]Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
[12]https://github.com/ozzzp/MLHF

總結(jié)

以上是生活随笔為你收集整理的如何计算一个神经网络在使用momentum时的hessian矩阵(论文调研)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。