當(dāng)前位置：首頁 > 编程语言 > c/c++ >内容正文

c/c++

TensorRT(3)-C++ API使用：mnist手写体识别

發(fā)布時(shí)間：2024/9/27 c/c++ 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 TensorRT(3)-C++ API使用：mnist手写体识别小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

本節(jié)將介紹如何使用tensorRT C++ API 進(jìn)行網(wǎng)絡(luò)模型創(chuàng)建。

1 使用C++ API 進(jìn)行 tensorRT 模型創(chuàng)建

還是通過 tensorRT官方給的一個(gè)例程來學(xué)習(xí)。

還是mnist手寫體識(shí)別的例子。上一節(jié)主要是用 tensorRT提供的NvCaffeParser來將 Caffe中的model 轉(zhuǎn)換成tensorRT中特有的模型結(jié)構(gòu)。NvCaffeParser是tensorRT封裝好的一個(gè)用以解析Caffe模型的工具（較頂層的API），同樣的還有 NvUffPaser是用于解析TensorFlow的工具。

除了以上兩個(gè)封裝好的工具之外，還可以使用tensorRT提供的C++ API（底層的API）來直接在tensorRT中創(chuàng)建模型。這時(shí) tensorRT 相當(dāng)于是一個(gè)獨(dú)立的深度學(xué)習(xí)框架了，這個(gè)框架和其他框架（Caffe, TensorFlow，MXNet等）一樣都具備搭建網(wǎng)絡(luò)模型的能力（只有前向計(jì)算沒有反向傳播）。

不同之處在于：

這個(gè)框架不能用于訓(xùn)練，模型的權(quán)值參數(shù)要人為給定；
可以針對設(shè)定網(wǎng)絡(luò)模型（自己使用API創(chuàng)建網(wǎng)絡(luò)模型）或給定模型（使用NvCaffeParser或NvUffPaser導(dǎo)入其他深度學(xué)習(xí)框架訓(xùn)練好的模型）做一系列優(yōu)化，以加快推理速度（inference）

使用C++ API函數(shù)部署網(wǎng)絡(luò)主要分為四個(gè)步驟：

創(chuàng)建網(wǎng)絡(luò)；
為網(wǎng)絡(luò)添加輸入；
添加各種各樣的層；
設(shè)定網(wǎng)絡(luò)輸出；

以上，第1,2,4步驟在使用 NvCaffeParser 時(shí)也是有的。只有第3步是本節(jié)所講的方法中特有的，其實(shí)對于NvCaffeParser 工具來說，他只是把第 3步封裝起來了而已。

如下，對比一下 NvCaffeParser 的使用方法，下面的代碼中只列出了關(guān)鍵部分的代碼。完整代碼請看上一節(jié)。

//build phase

INetworkDefinition* network = builder->createNetwork(); //1. 創(chuàng)建網(wǎng)絡(luò)

CaffeParser* parser = createCaffeParser();

std::unordered_map<std::string, infer1::Tensor> blobNameToTensor;

const IBlobNameToTensor* blobNameToTensor = //3. 添加各種各樣的層

parser->parse(locateFile(deployFile).c_str(), //NvCaffeParser 工具

locateFile(modelFile).c_str(), //把添加層的內(nèi)容封裝起來了

*network,

DataType::kFLOAT);

for (auto& s : outputs)

network->markOutput(*blobNameToTensor->find(s.c_str())); // 4. 設(shè)定網(wǎng)絡(luò)輸出

ICudaEngine* engine = builder->buildCudaEngine(*network); //創(chuàng)建engine

//省略一些內(nèi)容………………

//execution phase

IExecutionContext *context = engine->createExecutionContext(); //創(chuàng)建 context

int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME),

outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME); //2.為網(wǎng)絡(luò)添加輸入

//省略一些內(nèi)容………………

context.enqueue(batchSize, buffers, stream, nullptr); //調(diào)用cuda核計(jì)算

cudaStreamSynchronize(stream); //同步cuda 流

上述四個(gè)步驟對應(yīng)部分已在注釋標(biāo)出。可見 NvCaffeParser 工具中最主要的是 parse 函數(shù)，這個(gè)函數(shù)接受網(wǎng)絡(luò)模型文件（deploy.prototxt）、權(quán)值文件（net.caffemodel）為參數(shù)，這兩個(gè)文件是caffe的模型定義文件和訓(xùn)練參數(shù)文件。parse 函數(shù)會(huì)解析這兩個(gè)文件并對應(yīng)生成 tensorRT的模型結(jié)構(gòu)。

對于NvCaffeParser 工具來說，是需要三個(gè)文件的，分別是：

網(wǎng)絡(luò)模型文件（比如，caffe的deploy.prototxt）
訓(xùn)練好的權(quán)值文件（比如，caffe的net.caffemodel）
標(biāo)簽文件（這個(gè)主要是將模型產(chǎn)生的數(shù)字標(biāo)號分類，與真實(shí)的名稱對應(yīng)起來）

以下分步驟說明四個(gè)步驟：

1.1 創(chuàng)建網(wǎng)絡(luò)

先創(chuàng)建一個(gè)tensorRT的network，這個(gè)network 現(xiàn)在只是個(gè)空架子，比較簡單：

1	INetworkDefinition* network = builder->createNetwork();

1.2 為網(wǎng)絡(luò)添加輸入

所有的網(wǎng)絡(luò)都需要明確輸入是哪個(gè)blob，因?yàn)檫@是數(shù)據(jù)傳送的入口。

1 2	// Create input of shape { 1, 1, 28, 28 } with name referenced by INPUT_BLOB_NAME auto data = network->addInput(INPUT_BLOB_NAME, dt, DimsCHW{ 1, INPUT_H, INPUT_W});

INPUT_BLOB_NAME 是為輸入 blob起的名字;

dt是指數(shù)據(jù)類型，有kFLOAT(float 32), kHALF(float 16), kINT8(int 8)等類型;

//位于 NvInfer.h 文件

enum class DataType : int

{

kFLOAT = 0, //!< FP32 format.

kHALF = 1, //!< FP16 format.

kINT8 = 2, //!< INT8 format.

kINT32 = 3 //!< INT32 format. 這個(gè)是TensorRT新增的

};

DimsCHW{ 1, INPUT_H, INPUT_W} 是指，batch為1（省略），channel 為1，輸入height 和width分別為 INPUT_H, INPUT_W的blob；

1.3 添加各種各樣的層

以下示例是添加一個(gè) scale layer

// Create a scale layer with default power/shift and specified scale parameter. float

scale_param = 0.0125f;

Weights power{DataType::kFLOAT, nullptr, 0};

Weights shift{DataType::kFLOAT, nullptr, 0};

Weights scale{DataType::kFLOAT, &scale_param, 1};

auto scale_1 = network->addScale(*data, ScaleMode::kUNIFORM, shift, scale, power);

主要就是 addScale 函數(shù)，后面接受的參數(shù)是這一層需要設(shè)置的參數(shù)。

scale 層的作用是對每個(gè)輸入數(shù)據(jù)進(jìn)行冪運(yùn)算

f(x)= (shift + scale * x) ^ power

層類型：Power

可選參數(shù)：

　　power: 默認(rèn)為1

　　scale: 默認(rèn)為1

　　shift: 默認(rèn)為0

就是一種激活層。

Weights 類的定義如下：

//NvInfer.h 文件

class Weights

{

public:

DataType type; //!< the type of the weights

const void* values; //!< the weight values, in a contiguous array

int64_t count; //!< the number of weights in the array

};

以上是不包含訓(xùn)練參數(shù)的層，還有 Relu層，Pooling層等。

包含訓(xùn)練參數(shù)的層，比如卷積層，全連接層，要先加載權(quán)值文件。

以下示例是添加一個(gè)卷積層

// Add convolution layer with 20 outputs and a 5x5 filter.

// 加載權(quán)值文件，加載一次即可

std::map<std::string, Weights> weightMap = loadWeights(locateFile("mnistapi.wts"));

//添加卷積層

IConvolutionLayer* conv1 = network->addConvolution(*scale_1->getOutput(0), 20, DimsHW{5, 5}, weightMap["conv1filter"], weightMap["conv1bias"]);

//設(shè)置步長

conv1->setStride(DimsHW{1, 1});

第6行添加卷積層：

1	IConvolutionLayer* conv1 = network->addConvolution(*scale_1->getOutput(0), 20, DimsHW{5, 5}, weightMap["conv1filter"], weightMap["conv1bias"]);

*scale_1->getOutput(0) ：獲取上一層 scale層的輸出20：卷積核個(gè)數(shù)，或者輸出feature map 層數(shù)DimsHW{5, 5}：卷積核大小weightMap["conv1filter"], weightMap["conv1bias"]：權(quán)值系數(shù)矩陣

上面的 mnistapi.wts 文件，是用于存放網(wǎng)絡(luò)中各個(gè)層間的權(quán)值系數(shù)的，該文件位于?/usr/src/tensorrt/data?文件夾中。

可以用notepad打開看一下，如下：

可見每一行都是一層的一些參數(shù)，比如 conv1bias 是指第一個(gè)卷積層的偏置系數(shù)，后面的0 指的是 kFLOAT 類型，也就是 float 32；后面的20是系數(shù)的個(gè)數(shù)，因?yàn)檩敵鍪?0，所以偏置是20個(gè)；下面一行是卷積核的系數(shù)，因?yàn)槭?0個(gè) 5×5的卷積核，所以有 20×5×5=500個(gè)參數(shù)。其它層依次類推。

這個(gè)文件是例程中直接給的，感覺像是用caffe等工具訓(xùn)練后，將weights系數(shù)從caffemodel 中提取出來的。直接讀取caffemodel應(yīng)該也是可以的，稍微改一下接口：解析caffemodel文件然后將層名和權(quán)值參數(shù)鍵值對存到一個(gè)map中，網(wǎng)上大概找了一下，比如?這個(gè)?，解析后的caffemodel如下所示：

conv1 最下面有一個(gè) blobs結(jié)構(gòu)，這個(gè)是weights系數(shù)；每一個(gè)包含參數(shù)的層（卷積，全連接等；激活層，池化層沒有參數(shù)）都有一個(gè) blobs結(jié)構(gòu)。只需將這些參數(shù)提取出來，保存到一個(gè)map中。

除此之外也可以添加很多其他的層，比如反卷積層，池化層，全連接層等，具體參考?英偉達(dá)官方API?。

添加層的過程就相當(dāng)于 NvCaffeParser 工具中 parse 函數(shù)解析 deploy.prototxt 文件的過程。

1.4 設(shè)定網(wǎng)絡(luò)輸出

網(wǎng)絡(luò)必須知道哪一個(gè)blob是輸出的。

如下代碼，在網(wǎng)絡(luò)的最后添加了一個(gè)softmax層，并將這個(gè)層命名為 OUTPUT_BLOB_NAME，之后指定為輸出層。

// Add a softmax layer to determine the probability.

auto prob = network->addSoftMax(*ip2->getOutput(0));

prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);

network->markOutput(*prob->getOutput(0));

那直接使用底層API有什么好處呢？看下表

FeatureC++PythonNvCaffeParserNvUffParser

CNNs	yes	yes	yes	yes
RNNs	yes	yes	no	no
INT8 Calibration	yes	yes	NA	NA
Asymmetric Padding	yes	yes	no	no

上表列出了 tensorRT 的不同特點(diǎn)與 API 對應(yīng)的情況。可以看到對于 RNN，int8校準(zhǔn)（float 32 轉(zhuǎn)為 int8），不對稱 padding 來說，NvCaffeParser是不支持的，只有 C++ API 和 Python API，才是支持的。

所以說如果是針對很復(fù)雜的網(wǎng)絡(luò)結(jié)構(gòu)使用tensorRT，還是直接使用底層的 C++ API，和Python API 較好。底層C++ API還可以解析像 darknet 這樣的網(wǎng)絡(luò)模型，因?yàn)樗枰木椭皇且粋€(gè)層名和權(quán)值參數(shù)對應(yīng)的map文件。

2 官方例程

例程位于?/usr/src/tensorrt/samples/sampleMNISTAPI

2.1 build phase

//這個(gè)是main函數(shù)中的代碼片段

// create a model using the API directly and serialize it to a stream

IHostMemory *modelStream{nullptr};

//調(diào)用APIToModel函數(shù)，手動(dòng)創(chuàng)建網(wǎng)絡(luò)模型

APIToModel(1, &modelStream);

APIToModel函數(shù)：

void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream)

{

// Create builder

IBuilder* builder = createInferBuilder(gLogger);

//下面這個(gè)createMNISTEngine函數(shù)才是真正手動(dòng)創(chuàng)建網(wǎng)絡(luò)的過程

// Create model to populate the network, then set the outputs and create an engine

ICudaEngine* engine = createMNISTEngine(maxBatchSize, builder, DataType::kFLOAT);

assert(engine != nullptr);

// Serialize the engine

(*modelStream) = engine->serialize();

// Close everything down

engine->destroy();

builder->destroy();

}

createMNISTEngine函數(shù)如下：

// Creat the engine using only the API and not any parser.

ICudaEngine* createMNISTEngine(unsigned int maxBatchSize, IBuilder* builder, DataType dt)

{

INetworkDefinition* network = builder->createNetwork();

// Create input tensor of shape { 1, 1, 28, 28 } with name INPUT_BLOB_NAME

ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{1, INPUT_H, INPUT_W});

assert(data);

// Create scale layer with default power/shift and specified scale parameter.

const float scaleParam = 0.0125f;

const Weights power{DataType::kFLOAT, nullptr, 0};

const Weights shift{DataType::kFLOAT, nullptr, 0};

const Weights scale{DataType::kFLOAT, &scaleParam, 1};

IScaleLayer* scale_1 = network->addScale(*data, ScaleMode::kUNIFORM, shift, scale, power);

assert(scale_1);

// Add convolution layer with 20 outputs and a 5x5 filter.

// 加載權(quán)值文件，加載一次即可

std::map<std::string, Weights> weightMap = loadWeights(locateFile("mnistapi.wts"));

// 添加卷積層

IConvolutionLayer* conv1 = network->addConvolution(*scale_1->getOutput(0), 20, DimsHW{5, 5}, weightMap["conv1filter"], weightMap["conv1bias"]);

assert(conv1);

//設(shè)置步長

conv1->setStride(DimsHW{1, 1});

// Add max pooling layer with stride of 2x2 and kernel size of 2x2.

IPoolingLayer* pool1 = network->addPooling(*conv1->getOutput(0), PoolingType::kMAX, DimsHW{2, 2});

assert(pool1);

pool1->setStride(DimsHW{2, 2});

// Add second convolution layer with 50 outputs and a 5x5 filter.

IConvolutionLayer* conv2 = network->addConvolution(*pool1->getOutput(0), 50, DimsHW{5, 5}, weightMap["conv2filter"], weightMap["conv2bias"]);

assert(conv2);

conv2->setStride(DimsHW{1, 1});

// Add second max pooling layer with stride of 2x2 and kernel size of 2x3>

IPoolingLayer* pool2 = network->addPooling(*conv2->getOutput(0), PoolingType::kMAX, DimsHW{2, 2});

assert(pool2);

pool2->setStride(DimsHW{2, 2});

// Add fully connected layer with 500 outputs.

IFullyConnectedLayer* ip1 = network->addFullyConnected(*pool2->getOutput(0), 500, weightMap["ip1filter"], weightMap["ip1bias"]);

assert(ip1);

// Add activation layer using the ReLU algorithm.

IActivationLayer* relu1 = network->addActivation(*ip1->getOutput(0), ActivationType::kRELU);

assert(relu1);

// Add second fully connected layer with 20 outputs.

IFullyConnectedLayer* ip2 = network->addFullyConnected(*relu1->getOutput(0), OUTPUT_SIZE, weightMap["ip2filter"], weightMap["ip2bias"]);

assert(ip2);

// Add softmax layer to determine the probability.

ISoftMaxLayer* prob = network->addSoftMax(*ip2->getOutput(0));

assert(prob);

prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);

network->markOutput(*prob->getOutput(0));

// Build engine

builder->setMaxBatchSize(maxBatchSize);

builder->setMaxWorkspaceSize(1 << 20);

ICudaEngine* engine = builder->buildCudaEngine(*network);

// Don't need the network any more

network->destroy();

// Release host memory

for (auto& mem : weightMap)

{

free((void*) (mem.second.values));

}

return engine;

}

可見里面包含了很多 add* 函數(shù)，都是用于添加各種各樣的層的。可參考英偉達(dá)官方API?。

2.2 deploy phase

deploy階段基本與之前的無異。

int main(int argc, char** argv)

{

………………

// Deserialize engine we serialized earlier

// 創(chuàng)建運(yùn)行時(shí)環(huán)境 IRuntime對象，傳入 gLogger 用于打印信息

IRuntime* runtime = createInferRuntime(gLogger);

assert(runtime != nullptr);

ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream->data(), trtModelStream->size(), nullptr);

assert(engine != nullptr);

trtModelStream->destroy();

//創(chuàng)建上下文環(huán)境，主要用于inference 函數(shù)中啟動(dòng)cuda核

IExecutionContext* context = engine->createExecutionContext();

assert(context != nullptr);

//2.deploy 階段：調(diào)用 inference 函數(shù)，進(jìn)行推理過程

// Run inference on input data

float prob[OUTPUT_SIZE];

doInference(*context, data, prob, 1);

………………

}

doInference函數(shù)如下：

void doInference(IExecutionContext& context, float* input, float* output, int batchSize)

{

const ICudaEngine& engine = context.getEngine();

// Pointers to input and output device buffers to pass to engine.

// Engine requires exactly IEngine::getNbBindings() number of buffers.

assert(engine.getNbBindings() == 2);

void* buffers[2];

// In order to bind the buffers, we need to know the names of the input and output tensors.

// Note that indices are guaranteed to be less than IEngine::getNbBindings()

const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);

const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);

// Create GPU buffers on device

CHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_H * INPUT_W * sizeof(float)));

CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));

// Create stream

cudaStream_t stream;

CHECK(cudaStreamCreate(&stream));

// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host

CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));

context.enqueue(batchSize, buffers, stream, nullptr);

CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));

cudaStreamSynchronize(stream);

// Release stream and buffers

cudaStreamDestroy(stream);

CHECK(cudaFree(buffers[inputIndex]));

CHECK(cudaFree(buffers[outputIndex]));

}

參考資料

caffe中的一些激活函數(shù)：Caffe學(xué)習(xí)系列(4)：激活層（Activiation Layers)及參數(shù) - denny402 - 博客園

caffemodel 解析：python讀取caffemodel文件 - ChrisZZ - 博客園

caffemodel 解析：http://www.cnblogs.com/zzq1989/p/4439429.html

tensorRT C++ API：TensorRT: TensorRT

tensorRT python API：TensorRT — NVIDIA TensorRT Standard Python API Documentation 8.4.0 documentation

tensorRT 開發(fā)者指南：Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

NVIDIA Deep Learning SDK：NVIDIA Documentation Center | NVIDIA Developer

總結(jié)

以上是生活随笔為你收集整理的TensorRT(3)-C++ API使用：mnist手写体识别的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： TensorRT(2)-基本使用：mni
下一篇：上学基金是什么保险