當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）

發(fā)布時(shí)間：2023/12/20 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

根據(jù)[4]中的說法，“Though results on the Hessian of individual layers were not included in this study”,似乎每個(gè)層都有一個(gè)對應(yīng)的Hessian矩陣。

根據(jù)[5]中的說法，最后一層的hessian矩陣很好計(jì)算，但是如果下一層，那就很不好計(jì)算

下面的這些對hessian矩陣的理論處理可能有幫助，先記載一下：
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
[7]很清晰地講解了分母是否轉(zhuǎn)置對求導(dǎo)結(jié)果的影響,如下：
對于 $x=(x1…xN)Tx=\left(x_{1} \dots x_{N}\right)^{T}$

$?f(x)?x=(?f(x)?x1?f(x)?x2??f(x)?xN)\frac{\partial f(x)}{\partial x}=\left(\begin{array}{c}{\frac{\partial f(x)}{\partial x_{1}}} \\ {\frac{\partial f(x)}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial f(x)}{\partial x_{N}}}\end{array}\right)$

$(?f(x)?x)T=?f(x)?xT=(?f(x)?x1?f(x)?x2…?f(x)?xN)\left(\frac{\partial f(x)}{\partial x}\right)^{T}=\frac{\partial f(x)}{\partial x^{T}}=\left(\frac{\partial f(x)}{\partial x_{1}} \quad \frac{\partial f(x)}{\partial x_{2}} \quad \ldots \quad \frac{\partial f(x)}{\partial x_{N}}\right)$

$?2f(x)?x?xT=(?2f(x)?x12?2f(x)?x1?x2??2f(x)?x1?xN?2f(x)?x2?x1?2f(x)?x22??2f(x)?xN?1?x2??2f(x)?xN?x1???2f(x)?xN2)\frac{\partial^{2} f(x)}{\partial x \partial x^{T}}=\left(\begin{array}{cccc}{\frac{\partial^{2} f(x)}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{N}}} \\ {\frac{\partial^{2} f(x)}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f(x)}{\partial x_{2}^{2}}} & {} & {\vdots} \\ {} & {\frac{\partial^{2} f(x)}{\partial x_{N-1}\partial x_{2}}} & {} & {\vdots} \\ {\frac{\partial^{2} f(x)}{\partial x_{N} \partial x_{1}}} & {\cdots} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{N}^{2}}}\end{array}\right)$

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

粘貼工具是:Mathpix Snipping Tool,第一次發(fā)現(xiàn)這工具截圖然后轉(zhuǎn)化不準(zhǔn)的問題，sigh…
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
[1]中的(2.16)~(2.18)無法核實(shí),
(3.1)~(3.3)中出現(xiàn)了奇怪的符號(hào)δ沒有說明是什么含義
(2.8)對于 $b_{ni}$ 的定義很奇怪,[1]中根據(jù)(2.15)與(2.12)的比較，可知該文是在論述二分類目的的神經(jīng)網(wǎng)絡(luò),該文作者無法聯(lián)系上，最終放棄閱讀。

[3]使用彈簧振子在模仿神經(jīng)網(wǎng)絡(luò)的不斷振蕩，分別從微分方程和差分方程兩個(gè)角度來論述為什么momentum這種optimizer能夠加速收斂

聯(lián)系了[4]作者，回復(fù)是需要谷歌的大量設(shè)備以及專門腳本才能復(fù)現(xiàn)，并不能在家里實(shí)現(xiàn)，連他自己手上都沒有代碼。
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

至于hessian-free的意思指的是，計(jì)算Hv而不是直接計(jì)算H，這樣避開計(jì)算H的龐大工作量。
計(jì)算 $H^{-1}V$ 的目標(biāo)是為了在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時(shí),二階牛頓法的迭代項(xiàng)中有所使用.
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

##################下面幾個(gè)github鏈接和hessian-free相關(guān)####################################
[7]這個(gè)作者不回復(fù)了,棄坑

https://github.com/drasmuss/hessianfree
這個(gè)里面的代碼主要是共軛梯度法，直接舍棄了和Jacobian和Hessian相關(guān)的操作

https://github.com/NithinTangellamudi/HessianFreeImplementation
代碼各種語法錯(cuò)誤，棄坑

☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
下面的還在研究中:
#------------------------------------------------------------------------------------------

[8]中的代碼配合論文[9]:
hessian-free部分的代碼如下：

def gauss_vect_mult(v):"""Multiply a vector by the Gauss-Newton matrix JHJ'where J is the Jacobian between output and params and H is the Hessian between costs and outputH should be diagonal and positive.Also add the ridge"""Jv = T.Rop(output, params, v)HJv = T.Rop(T.grad(opt_cost,output), output, Jv)JHJv = T.Lop(output, params, HJv)if not isinstance(JHJv,list):JHJv = [JHJv]JHJv = [a+ridge*b for a,b in zip(JHJv,v)]return JHJv

給作者發(fā)了郵件詢問理由支持，但是沒有回復(fù)
#------------------------------------------------------------------------------------------
[10]代碼是下面論文[11]的一部分

hessian-free部分的代碼如下：

def gauss_newton_product(cost, p, v, s): # this computes the product Gv = J'HJv (G is the Gauss-Newton matrix)Jv = T.Rop(s, p, v)HJv = T.grad(T.sum(T.grad(cost, s)*Jv), s, consider_constant=[Jv], disconnected_inputs='ignore')Gv = T.grad(T.sum(HJv*s), p, consider_constant=[HJv, Jv], disconnected_inputs='ignore')Gv = map(T.as_tensor_variable, Gv) # for CudaNdarrayreturn Gv

給作者發(fā)了郵件詢問理由支持，但是沒有回復(fù)
#------------------------------------------------------------------------------------------
[12]涉及到元學(xué)習(xí)

Reference:
[1]Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
[2]A fast procedure for re-training the multilayer perceptron
[3]On the Momentum Term in Gradient Descent Learning Algorithms
[4]Negative eigen values of the hessian in deep neural networks
[5]Most efficient way to calculate hessian of cost function in neural network
[6]https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470173862.app4
[7]https://github.com/moonl1ght/HessianFreeOptimization/issues/1
[8]https://github.com/doomie/HessianFree
[9]Improved Preconditioner for Hessian Free Optimization
[10]https://github.com/boulanni/theano-hf
[11]Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
[12]https://github.com/ozzzp/MLHF

總結(jié)

以上是生活随笔為你收集整理的如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：矩阵行列式的几何意义验证
下一篇：牛顿法中为何出现hessian矩阵