當(dāng)前位置：首頁 >

tensorRT-lenet C++代码分析【附代码】

發(fā)布時間：2024/1/18 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 tensorRT-lenet C++代码分析【附代码】小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

前面的文章中已經(jīng)寫了一個tensorRT簡單的demo---lenet推理【tensorRT-lenet】，實(shí)現(xiàn)了從torch模型轉(zhuǎn)wts【同時也展示出了wts內(nèi)網(wǎng)絡(luò)的詳細(xì)信息】再轉(zhuǎn)engine后的推理過程，本文章是在之前的基礎(chǔ)上去分析C++代碼的實(shí)現(xiàn)。

我們從main函數(shù)開始分析每個函數(shù)或者說某塊的作用。完整的代碼會在文末附出。

本人C++比較薄弱，寫的比較粗糙還請見諒。

1.參數(shù)的傳入

2.API創(chuàng)建模型--wts轉(zhuǎn)engine

IHostMemory

APIToModel

IBuileder?

IBuilderConfig?

ICudaEngine

?engine序列化

3. engine推理

?反序列化

?執(zhí)行推理

開始前向推理

doInference函數(shù)

完整代碼

可以總結(jié)一下本文的主要步驟。

wts-->engine：

? ? ? ? 1.利用IHostMemory創(chuàng)建一個modelStream用于后面API寫入engine

? ? ? ? 2.利用API創(chuàng)建模型：

????????????????1) 創(chuàng)建IBuilder；

? ? ? ? ? ? ? ? 2)?利用步驟1)創(chuàng)建IBuilderConfig；

? ? ? ? ? ? ? ? 3)利用自定義的createLenetEngine創(chuàng)建網(wǎng)絡(luò)，將wts中的權(quán)重數(shù)據(jù)寫入，返回engine；

? ? ? ? ? ? ? ? 4)將engine序列化后寫入步驟1 的modelStream；

? ? ? ? 3.將步驟2所得的modelStream寫入engine文件中

engine-->推理：

? ? ? ? 1.獲取engine文件的size;

? ? ? ? 2.將engine內(nèi)容放入開辟的堆區(qū)空間trtModelStream中(實(shí)際就是得到一個trt_model)

? ? ? ? 3.對步驟2中的model利用deserializeCudaEngine進(jìn)行反序列化，得到反序列化后的engine；

? ? ? ? 4.創(chuàng)建可執(zhí)行的Context()；

? ? ? ? 5.推理

1.參數(shù)的傳入

下面這幾行代碼是用來判斷傳入?yún)?shù)是否正確，-s指wts轉(zhuǎn)engine文件，-d指的是engine的前向推理。

int main(int argc, char** argv) {if (argc != 2) {std::cerr << "arguments not right!" << std::endl;std::cerr << "./lenet -s // serialize model to plan file" << std::endl;std::cerr << "./lenet -d // deserialize plan file and run inference" << std::endl;return -1;}

2.API創(chuàng)建模型--wts轉(zhuǎn)engine

核心的C++ API包含在?NvInfer.h 中，因此需要導(dǎo)入這個頭文件。

通過API創(chuàng)建模型，并將其序列化為數(shù)據(jù)流。??

// create a model using the API directly and serialize it to a streamchar *trtModelStream{nullptr};size_t size{0};

當(dāng)傳入?yún)?shù)為“-s”時為序列化model。

IHostMemory

這里需要先說明IHostMemory這個類。

該類是一個與分配內(nèi)存相關(guān)的庫，不可繼承，因?yàn)闀绊懙角跋騻鞑ビ嬎恪?/p>

該類有幾個成員函數(shù)：data (指針，指向數(shù)據(jù)的首地址)、size( data bytes)、type(數(shù)據(jù)類型)、destroy(釋放內(nèi)存)。

if (std::string(argv[1]) == "-s") {IHostMemory* modelStream{nullptr}; //模型二進(jìn)制數(shù)據(jù)流APIToModel(1, &modelStream);assert(modelStream != nullptr);std::ofstream p("lenet5.engine", std::ios::binary);if (!p){std::cerr << "could not open plan output file" << std::endl;return -1;}p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size());modelStream->destroy();return 1;

APIToModel

使用API創(chuàng)建model。APIToModel這個函數(shù)完整代碼如下，可以看到該函數(shù)傳入兩個參數(shù)，第一個就是Batch_size，第二個就是我們上面創(chuàng)建的IHostMemory對象，這個對象是一個二進(jìn)制的model stream，初始為空指針。

可以看到這個函數(shù)有這么幾個類型的對象，IBuilder、IBuilderConfig、ICudaEngine。這里將一一給出介紹。

void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream) {// Create builderIBuilder* builder = createInferBuilder(gLogger);IBuilderConfig* config = builder->createBuilderConfig();// Create model to populate the network, then set the outputs and create an engineICudaEngine* engine = createLenetEngine(maxBatchSize, builder, config, DataType::kFLOAT);assert(engine != nullptr);// Serialize the engine(*modelStream) = engine->serialize();// Close everything downengine->destroy();builder->destroy(); }

IBuileder?

創(chuàng)建builder。

下面這一行代碼其中g(shù)Logger也是模板化的固定代碼，用于顯示執(zhí)行過程的信息【這個信息就是后面創(chuàng)建engine時的過程】。

gLogger是之前定義的static Logger gLogger【需要logging.h頭文件】;

IBuilder* builder = createInferBuilder(gLogger);

IBuilderConfig?

構(gòu)建器的配置。指定用于創(chuàng)建engine的詳細(xì)信息。從下面這行代碼可以看出，創(chuàng)建的config這個指針指向的是之前構(gòu)建器(builder)中的成員函數(shù)createBuilderConfig()。

IBuilderConfig* config = builder->createBuilderConfig();

ICudaEngine

該API是NvInferRuntime.h頭文件下的。用于在構(gòu)建的網(wǎng)絡(luò)上執(zhí)行推理的engine，具有功能不安全的特性。?不可繼承。

下面代碼中的createLenetEngine函數(shù)用于構(gòu)建網(wǎng)絡(luò)推理時的engine。

// Create model to populate the network, then set the outputs and create an engineICudaEngine* engine = createLenetEngine(maxBatchSize, builder, config, DataType::kFLOAT);

createLenetEngine完整代碼如下，這個代碼很核心：

創(chuàng)建engine主要步驟為：

1.定義空網(wǎng)絡(luò)；

2.創(chuàng)建一個tensor;

3.wts權(quán)重加載；

4.卷積層的建立；

5.設(shè)置輸出結(jié)點(diǎn)name并獲得網(wǎng)絡(luò)輸出；

// Creat the engine using only the API and not any parser. ICudaEngine* createLenetEngine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt) {INetworkDefinition* network = builder->createNetworkV2(0U);// Create input tensor of shape { 1, 32, 32 } with name INPUT_BLOB_NAMEITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{1, INPUT_H, INPUT_W});assert(data);// Add convolution layer with 6 outputs and a 5x5 filter.std::map<std::string, Weights> weightMap = loadWeights("../lenet5.wts");IConvolutionLayer* conv1 = network->addConvolutionNd(*data, 6, DimsHW{5, 5}, weightMap["conv1.weight"], weightMap["conv1.bias"]);assert(conv1);conv1->setStrideNd(DimsHW{1, 1});// Add activation layer using the ReLU algorithm.IActivationLayer* relu1 = network->addActivation(*conv1->getOutput(0), ActivationType::kRELU);assert(relu1);// Add max pooling layer with stride of 2x2 and kernel size of 2x2.IPoolingLayer* pool1 = network->addPoolingNd(*relu1->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});assert(pool1);pool1->setStrideNd(DimsHW{2, 2});// Add second convolution layer with 16 outputs and a 5x5 filter.IConvolutionLayer* conv2 = network->addConvolutionNd(*pool1->getOutput(0), 16, DimsHW{5, 5}, weightMap["conv2.weight"], weightMap["conv2.bias"]);assert(conv2);conv2->setStrideNd(DimsHW{1, 1});// Add activation layer using the ReLU algorithm.IActivationLayer* relu2 = network->addActivation(*conv2->getOutput(0), ActivationType::kRELU);assert(relu2);// Add second max pooling layer with stride of 2x2 and kernel size of 2x2>IPoolingLayer* pool2 = network->addPoolingNd(*relu2->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});assert(pool2);pool2->setStrideNd(DimsHW{2, 2});// Add fully connected layerIFullyConnectedLayer* fc1 = network->addFullyConnected(*pool2->getOutput(0), 120, weightMap["fc1.weight"], weightMap["fc1.bias"]);assert(fc1);// Add activation layer using the ReLU algorithm.IActivationLayer* relu3 = network->addActivation(*fc1->getOutput(0), ActivationType::kRELU);assert(relu3);// Add second fully connected layerIFullyConnectedLayer* fc2 = network->addFullyConnected(*relu3->getOutput(0), 84, weightMap["fc2.weight"], weightMap["fc2.bias"]);assert(fc2);// Add activation layer using the ReLU algorithm.IActivationLayer* relu4 = network->addActivation(*fc2->getOutput(0), ActivationType::kRELU);assert(relu4);// Add third fully connected layerIFullyConnectedLayer* fc3 = network->addFullyConnected(*relu4->getOutput(0), OUTPUT_SIZE, weightMap["fc3.weight"], weightMap["fc3.bias"]);assert(fc3);// Add softmax layer to determine the probability.ISoftMaxLayer* prob = network->addSoftMax(*fc3->getOutput(0));assert(prob);prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);network->markOutput(*prob->getOutput(0));// Build enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(1 << 20);ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);// Don't need the network any morenetwork->destroy();// Release host memoryfor (auto& mem : weightMap){free((void*) (mem.second.values));}return engine; }

該函數(shù)返回類型為ICudaEngine。傳入?yún)?shù)有4個：Batch_Size，構(gòu)建器(builder)，builder config，DataType。

創(chuàng)建網(wǎng)絡(luò)的方式有兩種，一種是使用TRT的API創(chuàng)建網(wǎng)絡(luò)，一種是利用解析器將已有的模型轉(zhuǎn)換成Network，這里選擇的是前者。

1.網(wǎng)絡(luò)的定義

INetworkDefinition用于定義網(wǎng)絡(luò)。調(diào)用builder下的成員函數(shù)createNetworkV2。createNetworkV2(0U)為先創(chuàng)建一個空白的Network。

INetworkDefinition* network = builder->createNetworkV2(0U);

2.tensor的創(chuàng)建

利用addInput創(chuàng)建一個tensor，addInput傳入三個數(shù)據(jù)類型，const char*，DataType，Dims。這里設(shè)置的addInput是("data"，dt，Dims3{1,32,32})。

// Create input tensor of shape { 1, 32, 32 } with name INPUT_BLOB_NAMEITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{1, INPUT_H, INPUT_W});assert(data);

3.wts權(quán)重加載

// Add convolution layer with 6 outputs and a 5x5 filter.std::map<std::string, Weights> weightMap = loadWeights("../lenet5.wts");

loadWeights的完整代碼如下，file為權(quán)重路徑：

std::map<std::string, Weights> loadWeights(const std::string file) {std::cout << "Loading weights: " << file << std::endl;std::map<std::string, Weights> weightMap;// Open weights filestd::ifstream input(file);assert(input.is_open() && "Unable to load weight file.");// Read number of weight blobsint32_t count;input >> count;assert(count > 0 && "Invalid weight map file.");while (count--){Weights wt{DataType::kFLOAT, nullptr, 0};uint32_t size;// Read name and type of blobstd::string name;input >> name >> std::dec >> size;wt.type = DataType::kFLOAT;// Load blobuint32_t* val = reinterpret_cast<uint32_t*>(malloc(sizeof(val) * size));for (uint32_t x = 0, y = size; x < y; ++x){input >> std::hex >> val[x];}wt.values = val;wt.count = size;weightMap[name] = wt;}return weightMap; }

?4.卷積層的建立

調(diào)用addConvolutionNd函數(shù)，第一個參數(shù)為輸入網(wǎng)絡(luò)的tensor，第二個參數(shù)為輸出通道數(shù)，第三個參數(shù)為卷積的尺寸這里是5 * 5，第四個參數(shù)為對應(yīng)卷積的weights，第五個參數(shù)為對應(yīng)卷積的bias。設(shè)置步長為1，setStrideNd。這里設(shè)置的寬和高的步長。

IConvolutionLayer* conv1 = network->addConvolutionNd(*data, 6, DimsHW{5, 5}, weightMap["conv1.weight"], weightMap["conv1.bias"]); assert(conv1); conv1->setStrideNd(DimsHW{1, 1});

?4.1 添加激活函數(shù)

定義完卷積以后我們需要在添加激活函數(shù)，在lenet中使用的是Relu激活函數(shù)。addActivation()有兩個參數(shù)，第一個是輸入tensor，取的是conv的第一個維度(也就是batch這一維度)，第二個是激活函數(shù)類型kRELU就是ReLu激活函數(shù)。

// Add activation layer using the ReLU algorithm.IActivationLayer* relu1 = network->addActivation(*conv1->getOutput(0), ActivationType::kRELU);assert(relu1)

4.2 添加平均池化層

與上面一樣，addPoolingNd為池化層，第一參數(shù)為激活函數(shù)后的輸出作為該層的輸入，第二個參數(shù)為池化類型，第三個參數(shù)為尺寸的kernel大小這里是2 * 2，同樣設(shè)置步長為2。

// Add max pooling layer with stride of 2x2 and kernel size of 2x2.IPoolingLayer* pool1 = network->addPoolingNd(*relu1->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});assert(pool1);pool1->setStrideNd(DimsHW{2, 2});

4.3 添加全連接層

調(diào)用addFullyConnected添加全連接層。第一個參數(shù)為輸入，第二個參數(shù)輸出通道數(shù)，第三個和第四個數(shù)為weight和bias。

// Add fully connected layerIFullyConnectedLayer* fc1 = network->addFullyConnected(*pool2->getOutput(0), 120, weightMap["fc1.weight"], weightMap["fc1.bias"]);assert(fc1);

4.4添加softmax

添加softmax用addSoftmax即可。

// Add softmax layer to determine the probability.ISoftMaxLayer* prob = network->addSoftMax(*fc3->getOutput(0));assert(prob);

?5. 設(shè)置輸出結(jié)點(diǎn)的name并獲得輸出

prob->getOutput(0)->setName(OUTPUT_BLOB_NAME); network->markOutput(*prob->getOutput(0));

上面的步驟就是人為定義網(wǎng)絡(luò)結(jié)構(gòu)【這個網(wǎng)絡(luò)結(jié)構(gòu)參考pytorch中l(wèi)enet的forward寫即可】。只不過是將對應(yīng)的wts每層的權(quán)值賦值給你用C++寫的網(wǎng)絡(luò)的結(jié)構(gòu)中。

?build engine:

builder是我們前面構(gòu)造的構(gòu)造器，調(diào)用其中setMaxBatchSize傳入Batch_size大小。

config是前面的配置器，設(shè)置engine在執(zhí)行時使用最大GPU臨時內(nèi)存。

構(gòu)建engine：調(diào)用buildEngineWithConfig傳入前面定義好的network與配置信息。

構(gòu)建好以后就可以銷毀network了。這個engine就是我們已經(jīng)序列化后的網(wǎng)絡(luò)模型。

// Build enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(1 << 20);ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);// Don't need the network any morenetwork->destroy();

完成上述過程以后釋放內(nèi)存。?

// Release host memoryfor (auto& mem : weightMap){free((void*) (mem.second.values));}

?engine序列化

// Serialize the engine(*modelStream) = engine->serialize();// Close everything downengine->destroy();builder->destroy();

上面的APIToModel就已經(jīng)完成了engine的構(gòu)建，構(gòu)建完成后會自動保存engine文件在目錄中。接下來就是推理階段。

3. engine推理

傳入-d參數(shù)為推理模式。

讀取lenet5.engine文件。

獲取文件內(nèi)容大小(size)。

在堆區(qū)開辟一個和上述size一樣大小的空間trtModelStream，利用file.read讀入數(shù)據(jù)流。

else if (std::string(argv[1]) == "-d") {std::ifstream file("lenet5.engine", std::ios::binary);if (file.good()) {file.seekg(0, file.end);size = file.tellg();file.seekg(0, file.beg);trtModelStream = new char[size];assert(trtModelStream);file.read(trtModelStream, size);file.close();}

創(chuàng)建一個全為1的圖像(樣例)。

// Subtract mean from imagefloat data[INPUT_H * INPUT_W];for (int i = 0; i < INPUT_H * INPUT_W; i++)data[i] = 1.0;

記錄log。?

IRuntime* runtime = createInferRuntime(gLogger);assert(runtime != nullptr);

?反序列化

deserializeCudaEngine()為反序列化，傳入三個參數(shù)，第一個就是我們在堆區(qū)內(nèi)放入的序列化model，size為model大小，以及IPluginFactory。

ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size, nullptr);

?執(zhí)行推理

這個步驟就是得到反序化后的模型

IExecutionContext* context = engine->createExecutionContext();

開始前向推理

這里的OUTPUT_SIZE為10，這是因?yàn)閘enet網(wǎng)絡(luò)最終輸出為10個分類。

// Run inferencefloat prob[OUTPUT_SIZE];for (int i = 0; i < 1000; i++) {auto start = std::chrono::system_clock::now();doInference(*context, data, prob, 1);auto end = std::chrono::system_clock::now();//std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;}// Destroy the enginecontext->destroy();engine->destroy();runtime->destroy();

doInference函數(shù)

關(guān)鍵的函數(shù)為doInference。

?該函數(shù)傳入?yún)?shù)有4個，第一個為可執(zhí)行的context(其實(shí)就是反序列化后的model)，第二個為輸入，第三個為輸出，第四個為batch_size。

將緩存中input和output的指針傳遞給engine。

為了綁定緩存區(qū)，需要知道輸入和輸出的name,這個我們前面有定義過輸入的name為"data"，輸出的name為"prob"。

const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME); const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);

創(chuàng)建buffers

創(chuàng)建兩個buffer，這兩個buffer的大小要與input和ouput大小一樣。

// Create GPU buffers on device CHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_H * INPUT_W * sizeof(float))); CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));

?創(chuàng)建stream

// Create streamcudaStream_t stream;CHECK(cudaStreamCreate(&stream));

推理

DMA將batch輸入到device，異步推理batch，并將DMA輸出回host。

// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to hostCHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));context.enqueue(batchSize, buffers, stream, nullptr);CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));cudaStreamSynchronize(stream);

推理完后釋放

// Release stream and bufferscudaStreamDestroy(stream);CHECK(cudaFree(buffers[inputIndex]));CHECK(cudaFree(buffers[outputIndex]));

完整代碼：

void doInference(IExecutionContext& context, float* input, float* output, int batchSize) {const ICudaEngine& engine = context.getEngine();// Pointers to input and output device buffers to pass to engine.// Engine requires exactly IEngine::getNbBindings() number of buffers.assert(engine.getNbBindings() == 2);void* buffers[2];// In order to bind the buffers, we need to know the names of the input and output tensors.// Note that indices are guaranteed to be less than IEngine::getNbBindings()const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);// Create GPU buffers on deviceCHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_H * INPUT_W * sizeof(float)));CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));// Create streamcudaStream_t stream;CHECK(cudaStreamCreate(&stream));// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to hostCHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));context.enqueue(batchSize, buffers, stream, nullptr);CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));cudaStreamSynchronize(stream);// Release stream and bufferscudaStreamDestroy(stream);CHECK(cudaFree(buffers[inputIndex]));CHECK(cudaFree(buffers[outputIndex])); }

完整代碼

#include "NvInfer.h" #include "cuda_runtime_api.h" #include "logging.h" #include <fstream> #include <map> #include <chrono>#define CHECK(status) \do\{\auto ret = (status);\if (ret != 0)\{\std::cerr << "Cuda failure: " << ret << std::endl;\abort();\}\} while (0)// stuff we know about the network and the input/output blobs static const int INPUT_H = 32; static const int INPUT_W = 32; static const int OUTPUT_SIZE = 10;const char* INPUT_BLOB_NAME = "data"; const char* OUTPUT_BLOB_NAME = "prob";using namespace nvinfer1;static Logger gLogger;// Load weights from files shared with TensorRT samples. // TensorRT weight files have a simple space delimited format: // [type] [size] <data x size in hex> std::map<std::string, Weights> loadWeights(const std::string file) {std::cout << "Loading weights: " << file << std::endl;std::map<std::string, Weights> weightMap;// Open weights filestd::ifstream input(file);assert(input.is_open() && "Unable to load weight file.");// Read number of weight blobsint32_t count;input >> count;assert(count > 0 && "Invalid weight map file.");while (count--){Weights wt{DataType::kFLOAT, nullptr, 0};uint32_t size;// Read name and type of blobstd::string name;input >> name >> std::dec >> size;wt.type = DataType::kFLOAT;// Load blobuint32_t* val = reinterpret_cast<uint32_t*>(malloc(sizeof(val) * size));for (uint32_t x = 0, y = size; x < y; ++x){input >> std::hex >> val[x];}wt.values = val;wt.count = size;weightMap[name] = wt;}return weightMap; }// Creat the engine using only the API and not any parser. ICudaEngine* createLenetEngine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt) {INetworkDefinition* network = builder->createNetworkV2(0U);// Create input tensor of shape { 1, 32, 32 } with name INPUT_BLOB_NAMEITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{1, INPUT_H, INPUT_W});assert(data);// Add convolution layer with 6 outputs and a 5x5 filter.std::map<std::string, Weights> weightMap = loadWeights("../lenet5.wts");IConvolutionLayer* conv1 = network->addConvolutionNd(*data, 6, DimsHW{5, 5}, weightMap["conv1.weight"], weightMap["conv1.bias"]);assert(conv1);conv1->setStrideNd(DimsHW{1, 1});// Add activation layer using the ReLU algorithm.IActivationLayer* relu1 = network->addActivation(*conv1->getOutput(0), ActivationType::kRELU);assert(relu1);// Add max pooling layer with stride of 2x2 and kernel size of 2x2.IPoolingLayer* pool1 = network->addPoolingNd(*relu1->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});assert(pool1);pool1->setStrideNd(DimsHW{2, 2});// Add second convolution layer with 16 outputs and a 5x5 filter.IConvolutionLayer* conv2 = network->addConvolutionNd(*pool1->getOutput(0), 16, DimsHW{5, 5}, weightMap["conv2.weight"], weightMap["conv2.bias"]);assert(conv2);conv2->setStrideNd(DimsHW{1, 1});// Add activation layer using the ReLU algorithm.IActivationLayer* relu2 = network->addActivation(*conv2->getOutput(0), ActivationType::kRELU);assert(relu2);// Add second max pooling layer with stride of 2x2 and kernel size of 2x2>IPoolingLayer* pool2 = network->addPoolingNd(*relu2->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});assert(pool2);pool2->setStrideNd(DimsHW{2, 2});// Add fully connected layerIFullyConnectedLayer* fc1 = network->addFullyConnected(*pool2->getOutput(0), 120, weightMap["fc1.weight"], weightMap["fc1.bias"]);assert(fc1);// Add activation layer using the ReLU algorithm.IActivationLayer* relu3 = network->addActivation(*fc1->getOutput(0), ActivationType::kRELU);assert(relu3);// Add second fully connected layerIFullyConnectedLayer* fc2 = network->addFullyConnected(*relu3->getOutput(0), 84, weightMap["fc2.weight"], weightMap["fc2.bias"]);assert(fc2);// Add activation layer using the ReLU algorithm.IActivationLayer* relu4 = network->addActivation(*fc2->getOutput(0), ActivationType::kRELU);assert(relu4);// Add third fully connected layerIFullyConnectedLayer* fc3 = network->addFullyConnected(*relu4->getOutput(0), OUTPUT_SIZE, weightMap["fc3.weight"], weightMap["fc3.bias"]);assert(fc3);// Add softmax layer to determine the probability.ISoftMaxLayer* prob = network->addSoftMax(*fc3->getOutput(0));assert(prob);prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);network->markOutput(*prob->getOutput(0));// Build enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(1 << 20);ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);// Don't need the network any morenetwork->destroy();// Release host memoryfor (auto& mem : weightMap){free((void*) (mem.second.values));}return engine; }void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream) {// Create builderIBuilder* builder = createInferBuilder(gLogger);IBuilderConfig* config = builder->createBuilderConfig();// Create model to populate the network, then set the outputs and create an engineICudaEngine* engine = createLenetEngine(maxBatchSize, builder, config, DataType::kFLOAT);assert(engine != nullptr);// Serialize the engine(*modelStream) = engine->serialize();// Close everything downengine->destroy();builder->destroy(); }void doInference(IExecutionContext& context, float* input, float* output, int batchSize) {const ICudaEngine& engine = context.getEngine();// Pointers to input and output device buffers to pass to engine.// Engine requires exactly IEngine::getNbBindings() number of buffers.assert(engine.getNbBindings() == 2);void* buffers[2];// In order to bind the buffers, we need to know the names of the input and output tensors.// Note that indices are guaranteed to be less than IEngine::getNbBindings()const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);// Create GPU buffers on deviceCHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_H * INPUT_W * sizeof(float)));CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));// Create streamcudaStream_t stream;CHECK(cudaStreamCreate(&stream));// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to hostCHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));context.enqueue(batchSize, buffers, stream, nullptr);CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));cudaStreamSynchronize(stream);// Release stream and bufferscudaStreamDestroy(stream);CHECK(cudaFree(buffers[inputIndex]));CHECK(cudaFree(buffers[outputIndex])); }int main(int argc, char** argv) {if (argc != 2) {std::cerr << "arguments not right!" << std::endl;std::cerr << "./lenet -s // serialize model to plan file" << std::endl;std::cerr << "./lenet -d // deserialize plan file and run inference" << std::endl;return -1;}// create a model using the API directly and serialize it to a streamchar *trtModelStream{nullptr};size_t size{0};if (std::string(argv[1]) == "-s") {IHostMemory* modelStream{nullptr}; //模型二進(jìn)制數(shù)據(jù)流APIToModel(1, &modelStream);assert(modelStream != nullptr);std::ofstream p("lenet5.engine", std::ios::binary);if (!p){std::cerr << "could not open plan output file" << std::endl;return -1;}p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size());modelStream->destroy();return 1;} else if (std::string(argv[1]) == "-d") {std::ifstream file("lenet5.engine", std::ios::binary);if (file.good()) {file.seekg(0, file.end);size = file.tellg();file.seekg(0, file.beg);trtModelStream = new char[size];assert(trtModelStream);file.read(trtModelStream, size);file.close();}} else {return -1;}// Subtract mean from imagefloat data[INPUT_H * INPUT_W];for (int i = 0; i < INPUT_H * INPUT_W; i++)data[i] = 1.0;IRuntime* runtime = createInferRuntime(gLogger);assert(runtime != nullptr);ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size, nullptr);assert(engine != nullptr);IExecutionContext* context = engine->createExecutionContext();assert(context != nullptr);// Run inferencefloat prob[OUTPUT_SIZE];for (int i = 0; i < 1000; i++) {auto start = std::chrono::system_clock::now();doInference(*context, data, prob, 1);auto end = std::chrono::system_clock::now();std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;}// Destroy the enginecontext->destroy();engine->destroy();runtime->destroy();// Print histogram of the output distributionstd::cout << "\nOutput:\n\n";for (unsigned int i = 0; i < 10; i++){std::cout << prob[i] << ", ";}std::cout << std::endl;return 0; }

總結(jié)

以上是生活随笔為你收集整理的tensorRT-lenet C++代码分析【附代码】的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Webots介绍
下一篇： quartz 取消任务的正确做法