當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

caffe common 程序分析类中定义类

發(fā)布時(shí)間：2023/12/20 编程问答 47 豆豆

生活随笔收集整理的這篇文章主要介紹了 caffe common 程序分析类中定义类小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

caffe中有 common.hpp 和common.cpp

// The main singleton of Caffe class and encapsulates the boost and CUDA random number
// generation function, providing a unified interface.

caffe的singleton?類(lèi)，封裝boost和cuda等操作。提供一個(gè)統(tǒng)一的接口，是一種常見(jiàn)的設(shè)計(jì)模式

（1）設(shè)置cuda 隨機(jī)數(shù)

在具體實(shí)現(xiàn)中，這里還在類(lèi)中定義一個(gè)類(lèi)，例如：

class Caffe {
?public:
??~Caffe();
??inline static Caffe& Get() {
????if (!singleton_.get()) {
??????singleton_.reset(new Caffe());
????}
????return *singleton_;
??}
??enum Brew { CPU, GPU };

??// This random number generator facade hides boost and CUDA rng
??// implementation from one another (for cross-platform compatibility).
??class RNG {
???public:
????RNG();
????explicit RNG(unsigned int seed);
????explicit RNG(const RNG&);
????RNG& operator=(const RNG&);
????void* generator();
???private:
????class Generator;
????shared_ptr<Generator> generator_;
??};

}

類(lèi)中定義一個(gè)類(lèi)，雖然可以，但是建議盡量不要用，可讀性不好。類(lèi)都應(yīng)當(dāng)對(duì)是可以獨(dú)立存在的抽象

這種方法主要是用于封裝，要訪(fǎng)問(wèn) RNG類(lèi)，可以通過(guò)使用Caffe::RNG來(lái)用

這種方法可以在類(lèi)中封裝結(jié)構(gòu)體。但是在c++中結(jié)構(gòu)體和類(lèi)其實(shí)是一個(gè)東西，唯一區(qū)別是類(lèi)的成員默認(rèn)是private，而結(jié)構(gòu)體是public

但是由于一直以來(lái)的習(xí)慣，結(jié)構(gòu)體一般只是作為存儲(chǔ)數(shù)據(jù)用的數(shù)據(jù)結(jié)構(gòu)，沒(méi)有具體行為，這點(diǎn)也可以看做和類(lèi)的區(qū)別，因?yàn)轭?lèi)是有行為的（成員函數(shù)）

結(jié)構(gòu)體定義在類(lèi)的內(nèi)部和外部都是可以的，但是為了程序的可讀性，一般定義在類(lèi)的外部。

----------------------------------------------------------------------------------------------------------------------------

其中用到一個(gè)宏定義CUDA_KERNEL_LOOP

在common.hpp中有。

#defineCUDA_KERNEL_LOOP(i,n) \

for(inti = blockIdx.x * blockDim.x + threadIdx.x; \

i < (n); \

i +=blockDim.x * gridDim.x)

先看看caffe采取的線(xiàn)程格和線(xiàn)程塊的維數(shù)設(shè)計(jì)，

還是從common.hpp可以看到

CAFFE_CUDA_NUM_THREADS

CAFFE_GET_BLOCKS(constintN)

明顯都是一維的。

整理一下CUDA_KERNEL_LOOP格式看看，

for(inti = blockIdx.x * blockDim.x + threadIdx.x;

i< (n);

i+= blockDim.x * gridDim.x)

blockDim.x* gridDim.x表示的是該線(xiàn)程格所有線(xiàn)程的數(shù)量。

n表示核函數(shù)總共要處理的元素個(gè)數(shù)。

有時(shí)候，n會(huì)大于blockDim.x* gridDim.x，因此并不能一個(gè)線(xiàn)程處理一個(gè)元素。

由此通過(guò)上面的方法，讓一個(gè)線(xiàn)程串行（for循環(huán)）處理幾個(gè)元素。

這其實(shí)是常用的伎倆，得借鑒學(xué)習(xí)一下。

再來(lái)看一下這個(gè)核函數(shù)的實(shí)現(xiàn)。

template<typename Dtype>

__global__void mul_kernel(const int n, const Dtype* a,

constDtype* b, Dtype* y)

{

CUDA_KERNEL_LOOP(index,n)

{

y[index]= a[index] * b[index];

}

明顯就是算兩個(gè)向量的點(diǎn)積了。

由于向量的維數(shù)可能大于該kernel函數(shù)線(xiàn)程格的總線(xiàn)程數(shù)量。

因此有些線(xiàn)程可以要串行處理幾個(gè)元素。