當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

caffe框架翻译-理解（转载）

發(fā)布時間：2023/12/13 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 caffe框架翻译-理解（转载）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

本文轉(zhuǎn)自： ?http://dirlt.com/caffe.html

http://blog.csdn.net/songyu0120/article/details/46817085

1?caffe

http://caffe.berkeleyvision.org/

1.1?setup

安裝需要下面這些組件。這些組件都可以通過apt-get獲得。

libgoogle-glog-dev # glog
libgflags-dev # gflags
libhdf5-dev # hdf5
liblmdb-dev # lmdb
libleveldb-dev # leveldb
libsnappy-dev # snappy
libopencv-dev # opencv
liblapack-dev libblas-dev libatlas-dev libatlas-base-dev libopenblas-dev # blas

1.2?arch

caffe是非常模塊化的，可能這和神經(jīng)網(wǎng)絡(luò)本身就比較模塊化相關(guān)。主頁上有這個系統(tǒng)的設(shè)計哲學：

Expression: models and optimizations are defined as plaintext schemas instead of code. # 使用google protocol-buffers來描述網(wǎng)絡(luò)結(jié)構(gòu)和參數(shù)。protobuf居然還可以使用TextFormat載入文件，之前沒有不知道還有這個功能。這個功能非常適合描述大規(guī)模，結(jié)構(gòu)化，human-readable的數(shù)據(jù)。
Speed: for research and industry alike speed is crucial for state-of-the-art models and massive data. # tensor(在caffe里面叫做blob)既有g(shù)pu也有cpu實現(xiàn)。
Modularity: new tasks and settings require flexibility and extension. # 下面會說到caffe的幾個模塊: Solver, Net, Layer, Blob.
Openness: scientific and applied progress call for common code, reference models, and reproducibility. # 可以將訓練模型參數(shù)保存下來進行分發(fā), 存儲格式則是protocol-buffers的binary.
Community: academic research, startup prototypes, and industrial applications all share strength by joint discussion and development in a BSD-2 project.

這里先大概說一下幾個模塊：

Blob: 是caffe的數(shù)據(jù)表示，可以表示輸入輸出數(shù)據(jù)，也可以表示參數(shù)數(shù)據(jù)。
Layer: 不僅可以表示神經(jīng)網(wǎng)絡(luò)層，也可以表示數(shù)據(jù)輸入輸出層。Blob在Layer上流動(forward & backward)。
Net: 神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)，將這些Layers層疊和關(guān)聯(lián)起來。
Solver: 協(xié)調(diào)神經(jīng)網(wǎng)絡(luò)的訓練和測試，比如使用什么梯度下降以及具體參數(shù)，還支保存和恢復訓練狀態(tài)以及存儲網(wǎng)絡(luò)參數(shù)。

#note: prototxt描述文件大部分字段都非常好理解。對于不好理解的字段，或者是不知道有哪些參數(shù)的話，可以參考src/caffe/proto/caffe.proto. 這個文件里面每個字段都有比較詳細說明。

1.2.1?Blob

Blob是一個四維連續(xù)數(shù)組(4-D contiguous array, type = float32), 使用(n, k, h, w)表示的話，那么每一維的意思分別是：

n: number. 輸入數(shù)據(jù)量，比如進行sgd時候的mini-batch大小。
c: channel. 如果是圖像數(shù)據(jù)的話可以認為是通道數(shù)量。
h,w: height, width. 如果是圖像數(shù)據(jù)的話可以認為是圖片的高度和寬度。

當然Blob不一定就是用來表示圖像輸入數(shù)據(jù)。理解這些維度最重要的一點是，下標w是變化最快的。主頁里面舉了幾個例子：

the shape of blob holding 1000 vectors of 16 feature dimensions is 1000 x 16 x 1 x 1.
For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11.
For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1 x 1 x 1000 x 1024.

Blob內(nèi)部其實有兩個字段data, diff. data表示流動數(shù)據(jù)(輸出數(shù)據(jù))，而diff則存儲BP的梯度。data/diff可以存儲于cpu, 也可以存儲于gpu. 如果某個layer不支持gpu的話，那么就需要將gpu數(shù)據(jù)copy到cpu上，造成性能開銷。對于python/numpy用戶來說，可以用reshape函數(shù)來轉(zhuǎn)換為blob: data = data.reshape((-1, c, h, w))

1.2.2?Layer

caffe提供了許多內(nèi)置layer，比如convolution layer, pool layer, dropout layer, nonlinearity layer等。這些層說明以及具體參數(shù)都可以在?這里?查到（文檔比代碼有一些滯后，文檔里面沒有說支持了dropout但是實際已經(jīng)提供）。每個layer有輸入一些'bottom' blobs, 輸出一些'top' blobs. 輸入層是"data"和"label" blobs.

Each layer type defines three critical computations: setup, forward, and backward.

Setup: initialize the layer and its connections once at model initialization. # 初始化工作
Forward: given input from bottom compute the output and send to the top. # 前向轉(zhuǎn)播
Backward: given the gradient w.r.t. the top output compute the gradient w.r.t. to the input and send to the bottom. A layer with parameters computes the gradient w.r.t. to its parameters and stores it internally. # 反向轉(zhuǎn)播/計算梯度

caffe支持的layer完整在?http://caffe.berkeleyvision.org/tutorial/layers.html, 部分data layer還支持?預處理?操作

#note: 有可能文檔上名字和實際代碼對不上，如果是這樣的話可以閱讀src/caffe/layers/*_layer.cpp找到REGISTER_LAYER_CLASS(name). 其中name就是注冊的字符串

1.2.3?Net

net是layers組成的DAG, 并且可以使用文本格式來描述(protocol-buffers TextFormat). 比如下面文本生成的是logistic regression.

name: "LogReg" layers {name: "mnist"type: DATAtop: "data"top: "label"data_param {source: "input_leveldb"batch_size: 64} } layers {name: "ip"type: INNER_PRODUCTbottom: "data"top: "ip"inner_product_param {num_output: 2} } layers {name: "loss"type: SOFTMAX_LOSSbottom: "ip"bottom: "label"top: "loss" }

Net有個初始化函數(shù)Init(). 它的作用有兩個：1. 創(chuàng)建blosb和layers; 2. 調(diào)用layers的SetUp函數(shù)來初始化layers. 在這個過程中會打印日志來說明。注意在這個階段并沒有指明說是用GPU還是CPU來訓練，指定使用什么訓練是在solver層面的事情，這樣可以將模型和實現(xiàn)分離。Net還有Forward和Backward兩個函數(shù)，分別調(diào)用各個Layers的forward/backward. 最周如果我們進行預測的話，我們先填充好input blobs, 然后調(diào)用forward函數(shù)，最后獲取output blobs作為預測結(jié)果。

I0902 22:52:17.931977 2079114000 net.cpp:39] Initializing net from parameters: name: "LogReg" [...model prototxt printout...] # construct the network layer-by-layer I0902 22:52:17.932152 2079114000 net.cpp:67] Creating Layer mnist I0902 22:52:17.932165 2079114000 net.cpp:356] mnist -> data I0902 22:52:17.932188 2079114000 net.cpp:356] mnist -> label I0902 22:52:17.932200 2079114000 net.cpp:96] Setting up mnist I0902 22:52:17.935807 2079114000 data_layer.cpp:135] Opening leveldb input_leveldb I0902 22:52:17.937155 2079114000 data_layer.cpp:195] output data size: 64,1,28,28 I0902 22:52:17.938570 2079114000 net.cpp:103] Top shape: 64 1 28 28 (50176) I0902 22:52:17.938593 2079114000 net.cpp:103] Top shape: 64 1 1 1 (64) I0902 22:52:17.938611 2079114000 net.cpp:67] Creating Layer ip I0902 22:52:17.938617 2079114000 net.cpp:394] ip <- data I0902 22:52:17.939177 2079114000 net.cpp:356] ip -> ip I0902 22:52:17.939196 2079114000 net.cpp:96] Setting up ip I0902 22:52:17.940289 2079114000 net.cpp:103] Top shape: 64 2 1 1 (128) I0902 22:52:17.941270 2079114000 net.cpp:67] Creating Layer loss I0902 22:52:17.941305 2079114000 net.cpp:394] loss <- ip I0902 22:52:17.941314 2079114000 net.cpp:394] loss <- label I0902 22:52:17.941323 2079114000 net.cpp:356] loss -> loss # set up the loss and configure the backward pass I0902 22:52:17.941328 2079114000 net.cpp:96] Setting up loss I0902 22:52:17.941328 2079114000 net.cpp:103] Top shape: 1 1 1 1 (1) I0902 22:52:17.941329 2079114000 net.cpp:109] with loss weight 1 I0902 22:52:17.941779 2079114000 net.cpp:170] loss needs backward computation. I0902 22:52:17.941787 2079114000 net.cpp:170] ip needs backward computation. I0902 22:52:17.941794 2079114000 net.cpp:172] mnist does not need backward computation. # determine outputs I0902 22:52:17.941800 2079114000 net.cpp:208] This network produces output loss # finish initialization and report memory usage I0902 22:52:17.941810 2079114000 net.cpp:467] Collecting Learning Rate and Weight Decay. I0902 22:52:17.941818 2079114000 net.cpp:219] Network initialization done. I0902 22:52:17.941824 2079114000 net.cpp:220] Memory required for data: 201476

如果閱讀caffe/models會發(fā)現(xiàn)，這些例子下面有train.prototxt，還有一個deploy.prototxt. 差別僅僅在于deploy.txt沒有data-layer，而是在指定輸入的shape.

input: "data" input_dim: 10 input_dim: 1 input_dim: 28 input_dim: 28

從字面上來看train.prototxt是用來訓練出model的，而deploy.prototxt則是用來進行預測的。下面是使用python進行預測的代碼:

#note: 我沒有使用caffe自身提供的classifier.py, 因為我發(fā)現(xiàn)Classifier會對input做一些處理。在進行實驗的時候我發(fā)現(xiàn)使用Classifier得到的結(jié)果比直接使用Net::forward_all接口要差很多。

caffe.set_mode_cpu() net = caffe.Net('caffe-conf/test.prototxt','uv_iter_10000.caffemodel',caffe.TEST) data = data.reshape((-1, 1, 28, 28)) out = net.forward_all(**{'data': data}) rs = out['prob'] # 得到的是softmax. print_timer(<span class="org-string">"predict"</span>)

1.2.4?Solver

solver做了下面這些事情：

scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.
iteratively optimizes by calling forward / backward and updating parameters # Solver::ComputeUpdateValue()
(periodically) evaluates the test networks
snapshots the model and solver state throughout the optimization
- Solver::Snapshot() / Solver::Restore() # 保存和恢復網(wǎng)絡(luò)參數(shù), 后綴.caffemodel
- Solver::SnapshotSolverState() / Solver::RestoreSolverState() # 保存和恢復運行狀態(tài)，后綴.solverstate
- 文件名稱是<prefix>_iter_<N>，其中prefix是指定前綴，N表示迭代輪數(shù)。

solver每輪迭代做了下面這些事情：

calls network forward to compute the output and loss
calls network backward to compute the gradients
- Stochastic Gradient Descent (SGD),
- Adaptive Gradient (ADAGRAD),
- and Nesterov’s Accelerated Gradient (NESTEROV).
- 如何選擇和設(shè)置參數(shù)可以看?這里
incorporates the gradients into parameter updates according to the solver method
updates the solver state according to learning rate, history, and method

下面是solver.prototxt的一個示例（從examples/mnist/修改過來的）

# The train/test net protocol buffer definition net: "caffe-conf/train.prototxt"# 如果test數(shù)據(jù)量是10000，而bacth_size = 100的話，那么test_iter就應該設(shè)置100 # 這樣每次進行test就可以把所有的cases都使用上了 test_iter: 90 # Carry out testing every 500 training iterations. # 每進行500輪迭代進行一次測試 test_interval: 500# 下面這些是訓練使用參數(shù) # The base learning rate, momentum and the weight decay of the network. base_lr: 0.01 momentum: 0.9 weight_decay: 0.0005 # The learning rate policy lr_policy: "inv" gamma: 0.0001 power: 0.75# Display every 100 iterations display: 500 # The maximum number of iterations max_iter: 10000 # snapshot intermediate results # 每進行500輪做一次snapshot. # 每一輪使用的數(shù)據(jù)量大小為batch_size. snapshot: 500 snapshot_prefix: "uv" snapshot_after_train: true # solver mode: CPU or GPU # 使用CPU訓練 solver_mode: CPU

"net"表示train和test使用同一個net. 在net.prototxt中可以使用include語法來聲明說，某個layer是否需要包含在train/test階段.

如果你在訓練時候不想進行test的話，那么可以指定上面的"net"為"train_net". 當然你也可以使用"test_nets"來指定多個test_net.

1.3?python

http://caffe.berkeleyvision.org/tutorial/interfaces.html

caffe interfaces有三種: 1. command line 2. python binding 3. matlab binding. 這里就只寫python binding. caffe/examples下面有一些ipynb可以使用ipython-notebook查看。

caffe的python binding功能還是非常完備的

caffe.Net is the central interface for loading, configuring, and running models. caffe.Classsifier and caffe.Detector provide convenience interfaces for common tasks.
caffe.SGDSolver exposes the solving interface.
caffe.io handles input / output with preprocessing and protocol buffers.
caffe.draw visualizes network architectures.
Caffe blobs are exposed as numpy ndarrays for ease-of-use and efficiency.

我寫了個?示例?來解決Kaggle上?手寫數(shù)字識別?問題，prototxt是在examples/mnist基礎(chǔ)上稍作修改的（增加了一個dropout）。

#note: LB上的0.99586不是真實成績，這個是用mnist自帶的數(shù)據(jù)跑出的模型，而不是kaggle給出的數(shù)據(jù)。使用kaggle給出的數(shù)據(jù)最高跑到0.99071. 如果要改進的話，估計可以在caffe-prepare.py上多做一些數(shù)據(jù)變化來增加數(shù)據(jù)樣例大小(現(xiàn)在只是做了rotate).

訓練完成之后，使用某個case作為輸入，可以畫出conv1, pool1, conv2, pool2輸出圖像。

總結(jié)

以上是生活随笔為你收集整理的caffe框架翻译-理解（转载）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

框架
Caffe

上一篇： Awesome Deep Vision
下一篇： iOS多线程理解