當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

OpenCL编程详细解析与实例

發布時間：2023/11/28 生活经验 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 OpenCL编程详细解析与实例小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

OpenCL編程詳細解析與實例
C語言與OpenCL的編程示例比較
參考鏈接：
https://www.zhihu.com/people/wujianming_110117/posts
先以圖像旋轉的實例，具體介紹OpenCL編程的步驟。首先給出實現流程，然后給出實現圖像旋轉的C循環實現和OpenCL C kernel實現。
圖像旋轉原理
圖像旋轉是指把定義的圖像繞某一點以逆時針或順時針方向旋轉一定的角度，通常是指繞圖像的中心以逆時針方向旋轉。假設圖像的左上角為(l, t), 右下角為(r, b)，則圖像上任意點(x, y) 繞其中心(xcenter, ycenter)逆時針旋轉θ角度后，新的坐標位置(x’,y’)的計算公式為：
x′ = (x - xcenter) cosθ - (y － ycenter) sinθ + xcenter,
y′ = (x - xcenter) sinθ + (y － ycenter) cosθ + ycenter.
C代碼：
void rotate(
unsigned char* inbuf,
unsigned char* outbuf,
int w, int h,
float sinTheta,
float cosTheta)
{
int i, j;
int xc = w/2;
int yc = h/2;
for(i = 0; i < h; i++)
{
for(j=0; j< w; j++)
{
int xpos = (j-xc)cosTheta - (i - yc) * sinTheta + xc;
int ypos = (j-xc)sinTheta + (i - yc) * cosTheta + yc;
if(xpos>=0&&ypos>=0&&xpos<w&&ypos<h)
outbuf[yposw + xpos] = inbuf[iw+j];
}
}
}

OpenCL C kernel代碼：
#pragma OPENCL EXTENSION cl_amd_printf : enable
__kernel void image_rotate(
__global uchar * src_data,
__global uchar * dest_data, //Data in global memory
int W, int H, //Image Dimensions
float sinTheta, float cosTheta ) //Rotation Parameters
{
const int ix = get_global_id(0);
const int iy = get_global_id(1);
int xc = W/2;
int yc = H/2;
int xpos = ( ix-xc)*cosTheta - (iy-yc)*sinTheta+xc;
int ypos = (ix-xc)sinTheta + ( iy-yc)cosTheta+yc;
if ((xpos>=0) && (xpos< W) && (ypos>=0) && (ypos< H))
dest_data[yposW+xpos]= src_data[iyW+ix];
}

正如上面代碼中所給出的那樣，在C代碼中需要兩重循環來計算橫縱坐標上新的坐標位置。其實，在圖像旋轉的算法中每個點的計算可以獨立進行，與其它點的坐標位置沒有關系，所以并行處理較為方便。OpenCL C kernel代碼中用了并行處理。
上面的代碼在Intel的OpenCL平臺上進行了測試，處理器為雙核處理器，圖像大小為4288*3216，如果用循環的方式運行時間穩定在0.256s左右，而如果用OpenCL C kernel并行的方式，運行時間穩定在0.132秒左右。GPU的測試在NVIDIA的GeForce G105M顯卡上進行，運行時間穩定在0.0810s左右。從循環的方式，雙核CPU并行以及GPU并行計算已經可以看出，OpenCL編程的確能大大提高執行效率。
OpenCL編程詳細解析
OpenCL作為一門開源的異構并行計算語言，設計之初就是使用一種模型來模糊各種硬件差異。作為軟件開發人員，關注的就是編程模型。OpenCL程序的流程大致如下：
? Platform
? 查詢并選擇一個 platform
? 在 platform 上創建 context
? 在 context 上查詢并選擇一個或多個 device
? Running time
? 加載 OpenCL 內核程序并創建一個 program 對象
? 為指定的 device 編譯 program 中的 kernel
? 創建指定名字的 kernel 對象
? 為 kernel 創建內存對象
? 為 kernel 設置參數
? 在指定的 device 上創建 command queue
? 將要執行的 kernel 放入 command queue
? 將結果讀回 host
? 資源回收
模塊分析
使用 OpenCL API 編程與一般 C/C++ 引入第三方庫編程沒什么區別。所以，首先要做的自然是 include 相關的頭文件。由于在 MacOS X 10.6下OpenCL的頭文件命名與其他系統不同，通常使用一個#if defined進行區分，代碼如下：

#if defined(APPLE) || defined(__MACOSX)
#include <OpenCL/cl.hpp>
#else
#include <CL/cl.h>
#endif
接下來就進入真正的編碼流程了。
Platform
查詢并選擇一個 platform
首先要取得系統中所有的 OpenCL platform。所謂的 platform 指的就是硬件廠商提供的 OpenCL 框架，不同的 CPU/GPU 開發商（比如 Intel、AMD、Nvdia）可以在一個系統上分別定義自己的 OpenCL 框架。所以需要查詢系統中可用的 OpenCL 框架，即 platform。使用 API 函數 clGetPlatformIDs 獲取可用 platform 的數量：
cl_int status = 0;
cl_uint numPlatforms;
cl_platform_id platform = NULL;
status = clGetPlatformIDs( 0, NULL, &numPlatforms);
if(status != CL_SUCCESS){
```
 printf("Error: Getting Platforms\n");
```
```
 return EXIT_FAILURE;
```
}
然后根據數量來分配內存，并得到所有可用的 platform，所使用的 API 還是clGetPlatformIDs。在 OpenCL 中，類似這樣的函數調用很常見：第一次調用以取得數目，便于分配足夠的內存；然后調用第二次以獲取真正的信息。
if (numPlatforms > 0) {

 cl_platform_id *platforms = (cl_platform_id *)malloc(numPlatforms * sizeof(cl_platform_id));

 status = clGetPlatformIDs(numPlatforms, platforms, NULL);

```
 if (status != CL_SUCCESS) {
```

     printf("Error: Getting Platform Ids.(clGetPlatformIDs)\n");

```
     return -1;
```
```
 }
```

現在，所有的 platform 都存在了變量 platforms 中，接下來需要做的就是取得所需的 platform。本人的PC上配置的是 Intel 處理器和 AMD 顯卡，專業點的說法叫 Intel 的 CPU 和 NVIDIA的 GPU 😃。所以這兒有兩套 platform，為了體驗下 GPU 的快感，所以使用 AMD 的 platform。通過使用 clGetPlatformInfo 來獲得 platform 的信息。通過這個 API 可以知曉 platform 的廠商信息，以便選出需要的 platform。代碼如下：

for (unsigned int i = 0; i < numPlatforms; ++i) {
```
     char pbuff[100];
```
```
     status = clGetPlatformInfo(
```
```
                  platforms[i],
```
```
                  CL_PLATFORM_VENDOR,
```
```
                  sizeof(pbuff),
```
```
                  pbuff,
```
```
                  NULL);
```
```
     platform = platforms[i];
```

    if (!strcmp(pbuff, "Advanced Micro Devices, Inc.")) {

```
        break;
```
```
    }
```
```
}
```

不同的廠商信息可以參考 OpenCL Specifications，這兒只是簡單的篩選出 AMD 。
在 platform 上建立 context
第一步是通過 platform 得到相應的 context properties

// 如果能找到相應平臺，就使用，否則返回NULL
cl_context_properties cps[3] = {
```
 CL_CONTEXT_PLATFORM,
```
```
 (cl_context_properties)platform,
```
```
 0
```
};
cl_context_properties *cprops = (NULL == platform) ? NULL : cps;
第二步是通過 clCreateContextFromType 函數創建 context。
// 生成 context
cl_context context = clCreateContextFromType(
```
                      cprops,
```

                      CL_DEVICE_TYPE_GPU,

```
                      NULL,
```
```
                      NULL,
```
```
                      &status);
```
if (status != CL_SUCCESS) {

 printf("Error: Creating Context.(clCreateContexFromType)\n");

```
return EXIT_FAILURE;
```
}
函數的第二個參數可以設定 context 關聯的設備類型。本例使用的是 GPU 作為OpenCL計算設備。目前可以使用的類別包括：
- CL_DEVICE_TYPE_CPU
- CL_DEVICE_TYPE_GPU
- CL_DEVICE_TYPE_ACCELERATOR
- CL_DEVICE_TYPE_DEFAULT
- CL_DEVICE_TYPE_ALL
  在 context 上查詢 device
  context 創建好之后，要做的就是查詢可用的 device。
status = clGetContextInfo(context,

                       CL_CONTEXT_DEVICES,

```
                       0,
```
```
                       NULL,
```

                       &deviceListSize);

if (status != CL_SUCCESS) {

 printf("Error: Getting Context Info device list size, clGetContextInfo)\n");

```
 return EXIT_FAILURE;
```
}
cl_device_id *devices = (cl_device_id *)malloc(deviceListSize);
if (devices == 0) {
```
printf("Error: No devices found.\n");
```
```
return EXIT_FAILURE;
```
}
status = clGetContextInfo(context,

                      CL_CONTEXT_DEVICES,

```
                      deviceListSize,
```
```
                      devices,
```
```
                      NULL);
```
if (status != CL_SUCCESS) {

printf("Error: Getting Context Info (device list, clGetContextInfo)\n");

```
return EXIT_FAILURE;
```
}
與獲取 platform 類似，調用兩次 clGetContextInfo 來完成查詢。第一次調用獲取關聯 context 的 device 個數，并根據個數申請內存；第二次調用獲取所有 device 實例。如果想了解每個 device 的具體信息，可以調用 clGetDeviceInfo 函數來獲取，返回的信息有設備類型、生產商以及設備對某些擴展功能的支持與否等等。詳細使用情況請參閱 OpenCL Specifications。
到此，platform 相關的程序已經準備就緒了，下面到此的完整代碼：
/* OpenCL_01.cpp
- ? by keyring keyrings@163.com
- 2013.10.26
*/
#if defined(APPLE) || defined(__MACOSX)
#include <OpenCL/cl.hpp>
#else
#include <CL/cl.h>
#endif
#include
int main(int argc, char const *argv[])
{
```
printf("hello OpenCL\n");
```
```
cl_int status = 0;
```
```
size_t deviceListSize;
```
```
// 得到并選擇可用平臺
```
```
cl_uint numPlatforms;
```
```
cl_platform_id platform = NULL;
```

status = clGetPlatformIDs(0, NULL, &numPlatforms);

```
if (status != CL_SUCCESS) {
```

    printf("ERROR: Getting Platforms.(clGetPlatformIDs)\n");

```
    return EXIT_FAILURE;
```
```
}
```
```
if (numPlatforms > 0) {
```

    cl_platform_id *platforms = (cl_platform_id *)malloc(numPlatforms * sizeof(cl_platform_id));

    status = clGetPlatformIDs(numPlatforms, platforms, NULL);

```
    if (status != CL_SUCCESS) {
```

        printf("Error: Getting Platform Ids.(clGetPlatformIDs)\n");

```
        return -1;
```
```
    }
```

    // 遍歷所有 platform，選擇想用的

    for (unsigned int i = 0; i < numPlatforms; ++i) {

```
        char pbuff[100];
```
```
        status = clGetPlatformInfo(
```
```
                     platforms[i],
```

                     CL_PLATFORM_VENDOR,

```
                     sizeof(pbuff),
```
```
                     pbuff,
```
```
                     NULL);
```
```
        platform = platforms[i];
```

        if (!strcmp(pbuff, "Advanced Micro Devices, Inc.")) {

```
            break;
```
```
        }
```
```
    }
```
```
    delete platforms;
```
```
}
```

// 如果能找到相應平臺，就使用，否則返回NULL

```
cl_context_properties cps[3] = {
```
```
    CL_CONTEXT_PLATFORM,
```
```
    (cl_context_properties)platform,
```
```
    0
```
```
};
```

cl_context_properties *cprops = (NULL == platform) ? NULL : cps;

```
// 生成 context
```

cl_context context = clCreateContextFromType(

```
                         cprops,
```

                         CL_DEVICE_TYPE_GPU,

```
                         NULL,
```
```
                         NULL,
```
```
                         &status);
```
```
if (status != CL_SUCCESS) {
```

    printf("Error: Creating Context.(clCreateContexFromType)\n");

```
    return EXIT_FAILURE;
```
```
}
```
```
// 尋找OpenCL設備
```
```
// 首先得到設備列表的長度
```
```
status = clGetContextInfo(context,
```

                          CL_CONTEXT_DEVICES,

```
                          0,
```
```
                          NULL,
```

                          &deviceListSize);

```
if (status != CL_SUCCESS) {
```

    printf("Error: Getting Context Info device list size, clGetContextInfo)\n");

```
    return EXIT_FAILURE;
```
```
}
```

cl_device_id *devices = (cl_device_id *)malloc(deviceListSize);

```
if (devices == 0) {
```

    printf("Error: No devices found.\n");

```
    return EXIT_FAILURE;
```
```
}
```
```
// 然后得到設備列表
```
```
status = clGetContextInfo(context,
```

                          CL_CONTEXT_DEVICES,

                          deviceListSize,

```
                             devices,
```
```
                             NULL);
```
```
   if (status != CL_SUCCESS) {
```

       printf("Error: Getting Context Info (device list, clGetContextInfo)\n");

```
       return EXIT_FAILURE;
```
```
   }
```

Running time
前面寫了這么多，其實還沒真正進入具體的程序邏輯中，頂多算配好了 OpenCL 運行環境。真正的邏輯代碼，即程序的任務就是運行時模塊。本例的任務是在一個 4×4的二維空間上，按一定的規則給每個元素賦值，具體代碼如下：

#define KERNEL(…)#VA_ARGS
const char *kernelSourceCode = KERNEL(

                                __kernel void hellocl(__global uint *buffer)

{
```
 size_t gidx = get_global_id(0);
```
```
 size_t gidy = get_global_id(1);
```
```
 size_t lidx = get_local_id(0);
```

 buffer[gidx + 4 * gidy] = (1 << gidx) | (0x10 << gidy);

}
```
                           );
```

這一段就是真正的邏輯，也就是代碼要干的事。使用的是 OpenCL 自定的一門類C語言，具體的語法什么的現在先不糾結。這段代碼是直接嵌入 cpp 文件的靜態字符串。也可以將 kernel 程序單獨寫成一個文件。
加載 OpenCL 內核程序并創建一個 program 對象
接下來要做的就是讀入 OpenCL kernel 程序并創建一個 program 對象。

size_t sourceSize[] = {strlen(kernelSourceCode)};
cl_program program = clCreateProgramWithSource(context,
```
                  1,
```
```
                  &kernelSourceCode,
```
```
                  sourceSize,
```
```
                  &status);
```
if (status != CL_SUCCESS) {

 printf("Error: Loading Binary into cl_program (clCreateProgramWithBinary)\n");

```
 return EXIT_FAILURE;
```
}
本例中的 kernel 程序是作為靜態字符串讀入的（單獨的文本文件也一樣），所以使用的是 clCreateProgramWithSource，如果不想讓 kernel 程序讓其他人看見，可以先生成二進制文件，再通過 clCreateProgramWithBinary 函數動態讀入二進制文件，做一定的保密。詳細請參閱 OpenCL Specifications。
為指定的 device 編譯 program 中的 kernel
kernel 程序讀入完畢，要做的自然是使用 clBuildProgram 編譯 kernel：
status = clBuildProgram(program, 1, devices, NULL, NULL, NULL);
if (status != CL_SUCCESS) {

 printf("Error: Building Program (clBuildingProgram)\n");

```
 return EXIT_FAILURE;
```
}
最終，kernel 將被相應 device 上的 OpenCL 編譯器編譯成可執行的機器碼。
創建指定名字的 kernel 對象
成功編譯后，可以通過 clCreateKernel 來創建一個 kernel 對象。
cl_kernel kernel = clCreateKernel(program, “hellocl”, &status);
if (status != CL_SUCCESS) {

 printf("Error: Creating Kernel from program.(clCreateKernel)\n");

```
 return EXIT_FAILURE;
```
}
引號中的 hellocl 就是 kernel 對象所關聯的 kernel 函數的函數名。要注意的是，每個 kernel 對象必須關聯且只能關聯一個包含于相應 program 對象內的 kernel 程序。實際上，用戶可以在 cl 源代碼中寫任意多個 kernel 程序，但在執行某個 kernel 程序之前必須先建立單獨的 kernel 對象，即多次調用 clCreateKernel 函數。
為 kernel 創建內存對象
OpenCL 內存對象是指在 host 中創建，用于 kernel 程序的內存類型。按維度可以分為兩類，一類是 buffer，一類是 image。buffer 是一維的，image 可以是二維、三維的 texture、frame-buffer 或 image。本例僅僅使用 buffer，可以通過clCreateBuffer 函數來創建。
cl_mem outputBuffer = clCreateBuffer(

                                 context,

                                 CL_MEM_ALLOC_HOST_PTR,

                                 4 * 4 * 4,

                                 NULL,

                                 &status);

if (status != CL_SUCCESS) {

 printf("Error: Create Buffer, outputBuffer. (clCreateBuffer)\n");

```
 return EXIT_FAILURE;
```
}
為 kernel 設置參數
使用 clSetKernelArg 函數為 kernel 設置參數。傳遞的參數既可以是常數，變量，也可以是內存對象。本例傳遞的就是內存對象。
status = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&outputBuffer);
if (status != CL_SUCCESS) {

 printf("Error: Setting kernel argument. (clSetKernelArg)\n");

```
 return EXIT_FAILURE;
```
}
該函數每次只能設置一個參數，如有多個參數，需多次調用。而且 kernel 程序中所有的參數都必須被設置，否則在啟動 kernel 程序是會報錯。指定位置的參數的類型最好和對應 kernel 函數內參數類型一致，以免產生各種未知的錯誤。在設置好指定參數后，每次運行該 kernel 程序都會使用設置值，直到用戶使用次 API 重新設置參數。
在指定的 device 上創建 command queue
command queue 用于光里將要執行的各種命令。可以通過 clCreateCommandQueue函數創建。其中的 device 必須為 context 的關聯設備，所有該 command queue 中的命令都會在這個指定的 device 上運行。
cl_command_queue commandQueue = clCreateCommandQueue(context,

                             devices[0],

```
                             0,
```
```
                             &status);
```
if (status != CL_SUCCESS) {

 printf("Error: Create Command Queue. (clCreateCommandQueue)\n");

```
 return EXIT_FAILURE;
```
}
將要執行的 kernel 放入 command queue
創建好 command queue 后，用戶可以創建相應的命令并放入 command queue 中執行。OpenCL 提供了三種方案來創建 kernel 執行命令。最常用的即為本例所示的運行在指定工作空間上的 kernel 程序，使用了 clEnqueueNDRangeKernel 函數。
size_t globalThreads[] = {4, 4};
size_t localThreads[] = {2, 2};
status = clEnqueueNDRangeKernel(commandQueue, kernel,

                             2, NULL, globalThreads,

                             localThreads, 0,

                             NULL, NULL);

if (status != CL_SUCCESS) {
```
 printf("Error: Enqueueing kernel\n");
```
```
 return EXIT_FAILURE;
```
}
clEnqueueNDRangeKernel 函數每次只能將一個 kernel 對象放入 command queue 中，用戶可以多次調用該 API 將多個 kernel 對象放置到一個 command queue 中，command queue 中的不同 kernel 對象的工作區域完全不相關。其余兩個 APIclEnqueueTask 和 clEnqueueNativeKernel 的用法就不多講了，詳情請參閱OpenCL Specificarions。
最后可以用 clFinish 函數來確認一個 command queue 中所有的命令都執行完畢。函數會在 command queue 中所有 kernel 執行完畢后返回。
// 確認 command queue 中所有命令都執行完畢
status = clFinish(commandQueue);
if (status != CL_SUCCESS) {

 printf("Error: Finish command queue\n");

```
 return EXIT_FAILURE;
```
}
將結果讀回 host
計算完畢，將結果讀回 host 端。使用 clEnqueueReadBuffer 函數將 OpenCL buffer 對象中的內容讀取到 host 可以訪問的內存空間。
// 將內存對象中的結果讀回Host
status = clEnqueueReadBuffer(commandQueue,

                          outputBuffer, CL_TRUE, 0,

                          4 * 4 * 4, outbuffer, 0, NULL, NULL);

if (status != CL_SUCCESS) {
```
 printf("Error: Read buffer queue\n");
```
```
 return EXIT_FAILURE;
```
}
當然，為了看下程序的運行效果，咱們當然得看看運行結果啦。打印一下吧：
// Host端打印結果
printf(“out:\n”);
for (int i = 0; i < 16; ++i) {
```
 printf("%x ", outbuffer[i]);
```
```
 if ((i + 1) % 4 == 0)
```
```
     printf("\n");
```
}
資源回收
程序的最后是對所有創建的對象進行釋放回收，與C/C++的內存回收同理。
// 資源回收
status = clReleaseKernel(kernel);
status = clReleaseProgram(program);
status = clReleaseMemObject(outputBuffer);
status = clReleaseCommandQueue(commandQueue);
status = clReleaseContext(context);
free(devices);
delete outbuffer;
總結
這次使用一個小例子來詳細說明了 OpenCL 編程的一般步驟。其實這些步驟一般都是固定的。真正需要注意的是 OpenCL Kernel 程序的編寫。當然，合理高效的利用 API 也是一門技術活。
完整程序
#include
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#if defined(APPLE) || defined(__MACOSX)
#include <OpenCL/cl.hpp>
#else
#include <CL/cl.h>
#endif
using namespace std;
#define KERNEL(…)#VA_ARGS
const char *kernelSourceCode = KERNEL(

                               __kernel void hellocl(__global uint *buffer)

{
```
size_t gidx = get_global_id(0);
```
```
size_t gidy = get_global_id(1);
```
```
size_t lidx = get_local_id(0);
```

buffer[gidx + 4 * gidy] = (1 << gidx) | (0x10 << gidy);

}
```
                           );
```
int main(int argc, char const *argv[])
{
```
printf("hello OpenCL\n");
```
```
cl_int status = 0;
```
```
size_t deviceListSize;
```

// 當前服務器上配置的僅有NVIDIA Tesla C2050 的GPU

```
cl_platform_id platform = NULL;
```

status = clGetPlatformIDs(1, &platform, NULL);

```
if (status != CL_SUCCESS) {
```

    printf("ERROR: Getting Platforms.(clGetPlatformIDs)\n");

```
    return EXIT_FAILURE;
```
```
}
```

// 如果能找到相應平臺，就使用，否則返回NULL

```
cl_context_properties cps[3] = {
```
```
    CL_CONTEXT_PLATFORM,
```
```
    (cl_context_properties)platform,
```
```
    0
```
```
};
```

cl_context_properties *cprops = (NULL == platform) ? NULL : cps;

```
// 生成 context
```

cl_context context = clCreateContextFromType(

```
                         cprops,
```

                         CL_DEVICE_TYPE_GPU,

```
                         NULL,
```
```
                         NULL,
```
```
                         &status);
```
```
if (status != CL_SUCCESS) {
```

    printf("Error: Creating Context.(clCreateContexFromType)\n");

```
    return EXIT_FAILURE;
```
```
}
```
```
// 尋找OpenCL設備
```
```
// 首先得到設備列表的長度
```
```
status = clGetContextInfo(context,
```

                          CL_CONTEXT_DEVICES,

```
                          0,
```
```
                          NULL,
```

                          &deviceListSize);

```
if (status != CL_SUCCESS) {
```

    printf("Error: Getting Context Info device list size, clGetContextInfo)\n");

```
    return EXIT_FAILURE;
```
```
}
```

cl_device_id *devices = (cl_device_id *)malloc(deviceListSize);

```
if (devices == 0) {
```

    printf("Error: No devices found.\n");

```
    return EXIT_FAILURE;
```
```
}
```
```
// 現在得到設備列表
```
```
status = clGetContextInfo(context,
```

                          CL_CONTEXT_DEVICES,

                          deviceListSize,

```
                          devices,
```
```
                          NULL);
```
```
if (status != CL_SUCCESS) {
```

    printf("Error: Getting Context Info (device list, clGetContextInfo)\n");

```
    return EXIT_FAILURE;
```
```
}
```

// 裝載內核程序，編譯CL program ,生成CL內核實例

size_t sourceSize[] = {strlen(kernelSourceCode)};

cl_program program = clCreateProgramWithSource(context,

```
                     1,
```

                        &kernelSourceCode,

```
                        sourceSize,
```
```
                        &status);
```
```
   if (status != CL_SUCCESS) {
```

       printf("Error: Loading Binary into cl_program (clCreateProgramWithBinary)\n");

```
       return EXIT_FAILURE;
```
```
   }
```

   // 為指定的設備編譯CL program.

   status = clBuildProgram(program, 1, devices, NULL, NULL, NULL);

```
   if (status != CL_SUCCESS) {
```

       printf("Error: Building Program (clBuildingProgram)\n");

```
       return EXIT_FAILURE;
```
```
   }
```

   // 得到指定名字的內核實例的句柄

   cl_kernel kernel = clCreateKernel(program, "hellocl", &status);

```
   if (status != CL_SUCCESS) {
```

       printf("Error: Creating Kernel from program.(clCreateKernel)\n");

```
       return EXIT_FAILURE;
```
```
   }
```
```
   // 創建 OpenCL buffer 對象
```

   unsigned int *outbuffer = new unsigned int [4 * 4];

```
   memset(outbuffer, 0, 4 * 4 * 4);
```

   cl_mem outputBuffer = clCreateBuffer(

```
       context, 
```
```
       CL_MEM_ALLOC_HOST_PTR, 
```
```
       4 * 4 * 4, 
```
```
       NULL, 
```
```
       &status);
```
```
   if (status != CL_SUCCESS) {
```

       printf("Error: Create Buffer, outputBuffer. (clCreateBuffer)\n");

```
       return EXIT_FAILURE;
```
```
   }
```
```
   //  為內核程序設置參數
```

   status = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&outputBuffer);

```
   if (status != CL_SUCCESS) {
```

       printf("Error: Setting kernel argument. (clSetKernelArg)\n");

```
       return EXIT_FAILURE;
```
```
   }
```
```
   // 創建一個OpenCL command queue
```

   cl_command_queue commandQueue = clCreateCommandQueue(context,

                                   devices[0],

```
                                   0,
```

                                   &status);

```
   if (status != CL_SUCCESS) {
```

       printf("Error: Create Command Queue. (clCreateCommandQueue)\n");

```
       return EXIT_FAILURE;
```
```
   }
```

   // 將一個kernel 放入 command queue

```
   size_t globalThreads[] = {4, 4};
```
```
   size_t localThreads[] = {2, 2};
```

   status = clEnqueueNDRangeKernel(commandQueue, kernel,

                                   2, NULL, globalThreads,

                                   localThreads, 0,

                                   NULL, NULL);

```
   if (status != CL_SUCCESS) {
```

       printf("Error: Enqueueing kernel\n");

```
       return EXIT_FAILURE;
```
```
   }
```

   // 確認 command queue 中所有命令都執行完畢

```
   status = clFinish(commandQueue);
```
```
   if (status != CL_SUCCESS) {
```

       printf("Error: Finish command queue\n");

```
       return EXIT_FAILURE;
```
```
   }
```

   // 將內存對象中的結果讀回Host

   status = clEnqueueReadBuffer(commandQueue,

                                outputBuffer, CL_TRUE, 0,

                                4 * 4 * 4, outbuffer, 0, NULL, NULL);

```
   if (status != CL_SUCCESS) {
```

       printf("Error: Read buffer queue\n");

```
       return EXIT_FAILURE;
```
```
   }
```
```
   // Host端打印結果
```
```
   printf("out:\n");
```
```
   for (int i = 0; i < 16; ++i) {
```
```
       printf("%x ", outbuffer[i]);
```
```
       if ((i + 1) % 4 == 0)
```
```
           printf("\n");
```
```
   }
```
```
   // 資源回收
```
```
   status = clReleaseKernel(kernel);
```
```
   status = clReleaseProgram(program);
```

   status = clReleaseMemObject(outputBuffer);

   status = clReleaseCommandQueue(commandQueue);

```
   status = clReleaseContext(context);
```
```
   free(devices);
```
```
   delete outbuffer;
```
```
   return 0;
```
}
運行結果

參考鏈接：
https://www.zhihu.com/people/wujianming_110117/posts
https://blog.csdn.net/fly_yr/article/details/51259692
https://www.cnblogs.com/wangshide/archive/2012/01/07/2315830.html

總結

以上是生活随笔為你收集整理的OpenCL编程详细解析与实例的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： C语言与OpenCL的编程示例比较
下一篇： Computer OS系统基本原理