日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

CUDA ---- device管理

發布時間:2025/3/20 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 CUDA ---- device管理 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

device管理

NVIDIA提供了集中凡是來查詢和管理GPU device,掌握GPU信息查詢很重要,因為這可以幫助你設置kernel的執行配置。

本博文將主要介紹下面兩方面內容:

  • CUDA runtime API function
  • NVIDIA系統管理命令行

使用runtime API來查詢GPU信息

你可以使用下面的function來查詢所有關于GPU device 的信息:

cudaError_t cudaGetDeviceProperties(cudaDeviceProp *prop, int device);

GPU的信息放在cudaDeviceProp這個結構體中。

代碼

#include <cuda_runtime.h>
#include?<stdio.h>
int?main(int?argc,?char?**argv) {
  printf(
"%s Starting...\n", argv[0]);int deviceCount = 0;cudaError_t error_id = cudaGetDeviceCount(&deviceCount);if (error_id != cudaSuccess) {printf("cudaGetDeviceCount returned %d\n-> %s\n",(int)error_id, cudaGetErrorString(error_id));printf("Result = FAIL\n");exit(EXIT_FAILURE);}if (deviceCount == 0) {printf("There are no available device(s) that support CUDA\n");} else {printf("Detected %d CUDA Capable device(s)\n", deviceCount);}
int dev, driverVersion = 0, runtimeVersion = 0;dev =0;cudaSetDevice(dev);cudaDeviceProp deviceProp;cudaGetDeviceProperties(&deviceProp, dev);printf("Device %d: \"%s\"\n", dev, deviceProp.name);cudaDriverGetVersion(&driverVersion);cudaRuntimeGetVersion(&runtimeVersion);printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n",driverVersion/1000, (driverVersion%100)/10,runtimeVersion/1000, (runtimeVersion%100)/10);printf(" CUDA Capability Major/Minor version number: %d.%d\n",deviceProp.major, deviceProp.minor);printf(" Total amount of global memory: %.2f MBytes (%llu bytes)\n",(float)deviceProp.totalGlobalMem/(pow(1024.0,3)),(unsigned long long) deviceProp.totalGlobalMem);printf(" GPU Clock rate: %.0f MHz (%0.2f GHz)\n",deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f);printf(" Memory Clock rate: %.0f Mhz\n",deviceProp.memoryClockRate * 1e-3f);printf(" Memory Bus Width: %d-bit\n",deviceProp.memoryBusWidth);if (deviceProp.l2CacheSize) {printf(" L2 Cache Size: %d bytes\n",deviceProp.l2CacheSize);}
printf(
" Max Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d,%d), 3D=(%d,%d,%d)\n",deviceProp.maxTexture1D , deviceProp.maxTexture2D[0],deviceProp.maxTexture2D[1],deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1],deviceProp.maxTexture3D[2]);
printf(
" Max Layered Texture Size (dim) x layers 1D=(%d) x %d, 2D=(%d,%d) x %d\n",deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1],deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1],deviceProp.maxTexture2DLayered[2]);
printf(
" Total amount of constant memory: %lu bytes\n",deviceProp.totalConstMem);printf(" Total amount of shared memory per block: %lu bytes\n",deviceProp.sharedMemPerBlock);printf(" Total number of registers available per block: %d\n",deviceProp.regsPerBlock);printf(" Warp size: %d\n", deviceProp.warpSize);printf(" Maximum number of threads per multiprocessor: %d\n",deviceProp.maxThreadsPerMultiProcessor);printf(" Maximum number of threads per block: %d\n",deviceProp.maxThreadsPerBlock);
printf(
" Maximum sizes of each dimension of a block: %d x %d x %d\n",deviceProp.maxThreadsDim[0],deviceProp.maxThreadsDim[1],deviceProp.maxThreadsDim[2]);
printf(
" Maximum sizes of each dimension of a grid: %d x %d x %d\n",deviceProp.maxGridSize[0],deviceProp.maxGridSize[1],deviceProp.maxGridSize[2]);
printf(
" Maximum memory pitch: %lu bytes\n", deviceProp.memPitch);
exit(EXIT_SUCCESS); }

?

編譯運行:

$ nvcc checkDeviceInfor.cu -o checkDeviceInfor $ ./checkDeviceInfor

輸出:

./checkDeviceInfor Starting... Detected 2 CUDA Capable device(s) Device 0: "Tesla M2070" CUDA Driver Version / Runtime Version 5.5 / 5.5 CUDA Capability Major/Minor version number: 2.0 Total amount of global memory: 5.25 MBytes (5636554752 bytes) GPU Clock rate: 1147 MHz (1.15 GHz) Memory Clock rate: 1566 Mhz Memory Bus Width: 384-bit L2 Cache Size: 786432 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes

決定最佳GPU

對于支持多GPU的系統,是需要從中選擇一個來作為我們的device的,抉擇出最佳計算性能GPU的一種方法就是由其擁有的處理器數量決定,可以用下面的代碼來選擇最佳GPU。

int numDevices = 0; cudaGetDeviceCount(&numDevices); if (numDevices > 1) {int maxMultiprocessors = 0, maxDevice = 0;for (int device=0; device<numDevices; device++) {cudaDeviceProp props;cudaGetDeviceProperties(&props, device);if (maxMultiprocessors < props.multiProcessorCount) {maxMultiprocessors = props.multiProcessorCount;maxDevice = device;}}cudaSetDevice(maxDevice); }

使用nvidia-smi來查詢GPU信息

nvidia-smi是一個命令行工具,可以幫助你管理操作GPU device,并且允許你查詢和更改device狀態。

nvidia-smi用處很多,比如,下面的指令:

$ nvidia-smi -L GPU 0: Tesla M2070 (UUID: GPU-68df8aec-e85c-9934-2b81-0c9e689a43a7) GPU 1: Tesla M2070 (UUID: GPU-382f23c1-5160-01e2-3291-ff9628930b70)

然后可以使用下面的命令來查詢GPU 0 的詳細信息:

$nvidia-smi –q –i 0

下面是該命令的一些參數,可以精簡nvidia-smi的顯示信息:

MEMORY

UTILIZATION

ECC

TEMPERATURE

POWER

CLOCK

COMPUTE

PIDS

PERFORMANCE

SUPPORTED_CLOCKS

PAGE_RETIREMENT

ACCOUNTING

比如,顯示只device memory的信息:

$nvidia-smi –q –i 0 –d MEMORY | tail –n 5 Memory Usage Total : 5375 MB Used : 9 MB Free : 5366 MB

設置device

對于多GPU系統,使用nvidia-smi可以查看各GPU屬性,每個GPU從0開始依次標注,使用環境變量CUDA_VISIBLE_DEVICES可以指定GPU而不用修改application。

可以設置環境變量CUDA_VISIBLE_DEVICES-2來屏蔽其他GPU,這樣只有GPU2能被使用。當然也可以使用CUDA_VISIBLE_DEVICES-2,3來設置多個GPU,他們的device ID分別為0和1.

?

代碼下載:CodeSamples.zip

轉載于:https://www.cnblogs.com/1024incn/p/4539697.html

與50位技術專家面對面20年技術見證,附贈技術全景圖

總結

以上是生活随笔為你收集整理的CUDA ---- device管理的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。