日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 >

Caffe的Solver参数设置

發布時間:2025/3/21 233 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Caffe的Solver参数设置 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

http://caffe.berkeleyvision.org/tutorial/solver.html?
solver是通過協調前向-反向傳播的參數更新來控制參數優化的。一個模型的學習是通過Solver來監督優化和參數更新,以及通過Net來產生loss和梯度完成的。?
Caffe提供的優化方法有:

  • Stochastic Gradient Descent (type: “SGD”),
  • AdaDelta (type: “AdaDelta”),
  • Adaptive Gradient (type: “AdaGrad”),
  • Adam (type: “Adam”),
  • Nesterov’s Accelerated Gradient (type: “Nesterov”),
  • RMSprop (type: “RMSProp”)

The solver

scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.?
iteratively optimizes by calling forward / backward and updating parameters?
(periodically) evaluates the test networks?
snapshots the model and solver state throughout the optimization?
where each iteration

calls network forward to compute the output and loss?
calls network backward to compute the gradients?
incorporates the gradients into parameter updates according to the solver method?
updates the solver state according to learning rate, history, and method?
to take the weights all the way from initialization to learned model.

Like Caffe models, Caffe solvers run in CPU / GPU modes.

SGD

Stochastic gradient descent (type: “SGD”) updates the weights W by a linear combination of the negative gradient ?L(W) and the previous weight update Vt. The learning rate α is the weight of the negative gradient. The momentum μ is the weight of the previous update.

Formally, we have the following formulas to compute the update value Vt+1 and the updated weights Wt+1 at iteration t+1, given the previous weight update Vt and current weights Wt:

Vt+1=μVt?α?L(Wt)?
Wt+1=Wt+Vt+1?
The learning “hyperparameters” (α and μ) might require a bit of tuning for best results. If you’re not sure where to start, take a look at the “Rules of thumb” below, and for further information you might refer to Leon Bottou’s Stochastic Gradient Descent Tricks [1].

[1] L. Bottou. Stochastic Gradient Descent Tricks. Neural Networks: Tricks of the Trade: Springer, 2012.

總結solver文件個參數的意義

iteration: 數據進行一次前向-后向的訓練?
batchsize:每次迭代訓練圖片的數量?
epoch:1個epoch就是將所有的訓練圖像全部通過網絡訓練一次?
例如:假如有1280000張圖片,batchsize=256,則1個epoch需要1280000/256=5000次iteration?
它的max-iteration=450000,則共有450000/5000=90個epoch?
而lr什么時候衰減與stepsize有關,減少多少與gamma有關,即:若stepsize=500, base_lr=0.01, gamma=0.1,則當迭代到第一個500次時,lr第一次衰減,衰減后的lr=lr*gamma=0.01*0.1=0.001,以后重復該過程,所以?
stepsize是lr的衰減步長,gamma是lr的衰減系數。?
在訓練過程中,每到一定的迭代次數都會測試,迭代次數是由test-interval決定的,如test_interval=1000,則訓練集每迭代1000次測試一遍網絡,而?
test_size, test_iter, 和test圖片的數量決定了怎樣test, test-size決定了test時每次迭代輸入圖片的數量,test_iter就是test所有的圖片的迭代次數,如:500張test圖片,test_iter=100,則test_size=5, 而solver文檔里只需要根據test圖片總數量來設置test_iter,以及根據需要設置test_interval即可。

總結

以上是生活随笔為你收集整理的Caffe的Solver参数设置的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。