在上一节《
root@NanoPC-T6:/opt/qt-project/FloatVideo-TouchScreen# export DISPLAY=:0.0;./FloatVideo-TouchScreen -size 0.8 root@NanoPC-T6:~# top 任务: 278 total, 2 running, 276 sleeping, 0 stopped, 0 zombie %Cpu(s): 36.0 us, 1.9 sy, 0.0 ni, 62.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 15953.1 total, 14749.5 free, 662.4 used, 541.2 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14995.4 avail Mem 进程号 USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND 1513 root 20 0 2876120 127804 72488 S 270.9 0.8 0:27.46 FloatVideo-Touc 864 root 20 0 3345456 238548 186388 S 14.9 1.5 0:03.11 Xorg 1251 pi 20 0 1861028 76424 57116 S 2.6 0.5 0:00.96 xfwm4 ......
那么我们是不是可以通过
一、OpenCL 环境搭建
1.1 工作原理
platform layer API :在主机CPU 上运行,首先用于使程序能够发现系统中可用的并行处理器或计算设备。通过查询哪些计算设备可用,应用程序可以在不同的系统上便携地运行—适应加速器硬件的不同组合。一旦发现了计算设备,platform layer API 就允许应用程序选择并初始化它想要使用的设备;Runtime API :它使应用程序的内核程序能够为它们将要运行的计算设备编译,并行加载到这些处理器上并执行。一旦内核程序完成执行,将使用Runtime API 收集结果;
为了更好适用于不同的处理器,
- 平台模型:描述了
OpenCL 如何理解拓扑连接系统中的计算资源,对不同硬件及软件实现抽象,方便应用于不同设备; - 内存模型:对硬件的各种内存器进行了抽象;
- 执行模型:程序是如何在硬件上执行的;
- 编程模型:数据并行和任务并行;
1.2 平台模型
- 每个
Compute Device 包含多个Compute Unit ; - 每个
Compute Unit 又包含多个Processing Elements (处理单元)。
举例说明:计算设备可以是
1.3 内存模型
1.3.1 内存类型
Global memory : 全局内存对在上下文中执行的所有工作项可访问,主机可以使用__global 关键字读取、写入和映射命令访问全局内存,在单个工作组中,全局内存是一致的;Constant memory :常量内存是用于主机分配和初始化的对象的内存区域, 所有工作项都可以以只读方式访问常量内存;Local memory : 本地内存是特定于工作组的,工作组中的工作项可以访问本地内存;使用__local 关键字进行访问,对于工作组中的所有工作项来说,本地内存是一致的;Private memory :私有内存是特定于工作项的,其他工作项无法访问私有内存;
1.3.2 内存模型
1.4 执行模型
主机应用程序使用
通过将命令入队到命令队列(
1.4.1 主机应用程序
主机应用程序在应用处理器上运行。主机应用程序通过为以下命令设置命令队列来管理内核的执行:
- 内存命令;
- 内核执行命令;
- 同步操作;
1.4.2 上下文
主机应用程序为内核定义上下文。上下文包括:
-
计算设备(
Compute devices ); -
内核(
Kernels ):OpenCL 核心计算部分,类似C 语言的代码。在需要设备执行计算任务时,数据会被推送到Compute Device ,然后Compute Device 的计算单元会并发执行内核程序; -
程序对象(
Programs ):Kernels 的集合,OpenCL 中可以使用cl_program 表示; -
内存对象(
Memory Objects. );
1.4.3 OpenCL 内核的操作
Kernels 在主机应用程序中定义;- 主机应用程序将
kernel 提交给计算设备执行。计算设备可以是应用处理器、GPU 或其它类型的处理器; - 当主机应用程序发出提交
kernel 的命令时,OpenCL 创建工作项的NDRange ; - 对于
NDRange 中的每个元素,创建kernel 的一个实例。这使得每个元素可以独立并行地进行处理。
1.5 OpenCL 计算流程
对于
- 主机应用程序进行设备初始化(获取平台和设备
id ,创建上下文和命令队列); - 编写并编译
kernel (读取内核文件->创建program 对象->编译程序->创建内核) ; - 主机应用程序准备数据并传入设备(准备主机端数据,创建设备端内存对象并拷贝主机端数据);
- 主机应用程序将
kernel 提交给设备执行(传入kernel 函数参数, 启动kernel 函数); - 将结果拷贝回主机应用程序;
- 后续处理;
- 释放资源。
二、OpenCL 环境搭建
一个完整的
- 内核层
GPU 驱动; - 用户层动态库;
- 头文件;
- 应用程序;
2.1 内核层GPU 驱动
以
注意: 内核层
2.2 用户层动态库
用户层动态库有多种途径可以获得,比如以下两种:
-
寻找官方(
Mali ARM /Rockchip )提供的用户层动态库libmali.so ; -
下载
KhronosGroup OpenCL-SDK 源码,并编译,可以得到libOpenCL.so ;
下面我们分别介绍这两种方式, 对于这两种方式我们选择一种即可,对于我使用的
2.2.1 Mali ARM 官方下载安装libmali.so
通过浏览器进入
寻找官方提供的用户层动态库
不过不幸的是:
下载后,解压缩可以看到:
注意:上图中
将
root@NanoPC-T6:~# ln -s /usr/lib/libmali.so /usr/lib/libOpenCL.so
2.2.2 Rockchip 官方提供的libmali.so
我们使用的友善提供的
如何来查看是否已经安装了
root@NanoPC-T6:~# find /usr -name libmali.so /usr/lib/aarch64-linux-gnu/libmali.so root@NanoPC-T6:~# strings /usr/lib/aarch64-linux-gnu/libmali.so | grep Mali-G610 Mali-G610 root@NanoPC-T6:~# strings /usr/lib/aarch64-linux-gnu/libmali.so | grep cl ..... clReleaseCommandBufferKHR clReleaseCommandQueue clReleaseContext clReleaseDevice clReleaseEvent clReleaseKernel clReleaseMemObject ..... root@NanoPC-T6:~# ls -l /usr/lib/aarch64-linux-gnu/libmali.so lrwxrwxrwx 1 root root 12 7月 29 2020 /usr/lib/aarch64-linux-gnu/libmali.so -> libmali.so.1
其中
如果命令输出为空,则说明该库不是
此外在
root@NanoPC-T6:/opt# ls -l /usr/lib/aarch64-linux-gnu/libOpenCL* lrwxrwxrwx 1 root root 18 1月 12 2021 /usr/lib/aarch64-linux-gnu/libOpenCL.so.1 -> libOpenCL.so.1.0.0 -rw-r--r-- 1 root root 60856 1月 12 2021 /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 root@NanoPC-T6:/opt# strings /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 | grep cl fclose closedir dlclose clGetExtensionFunctionAddress clGetPlatformIDs clCreateContext clCreateContextFromType clGetGLContextInfoKHR ...... root@NanoPC-T6:/opt# ls -l /usr/lib/aarch64-linux-gnu/libEGL* lrwxrwxrwx 1 root root 20 3月 25 2021 /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0 -rw-r--r-- 1 root root 259072 3月 25 2021 /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0.0.0 lrwxrwxrwx 1 root root 11 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so -> libEGL.so.1 lrwxrwxrwx 1 root root 15 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so.1 -> libEGL.so.1.1.0 -rw-r--r-- 1 root root 84416 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so.1.1.0 ......
也可以通过如下
root@NanoPC-T6:~# aptitude install clinfo root@NanoPC-T6:~# clinfo arm_release_ver: g13p0-01eac0, rk_so_ver: 10 Number of platforms 1 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics ...... NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM] clCreateContext(NULL, ...) [default] Success [ARM] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name ARM Platform Device Name Mali-G610 r0p0 # GPU型号 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name ARM Platform Device Name Mali-G610 r0p0 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name ARM Platform Device Name Mali-G610 r0p0 ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.14 ICD loader Profile OpenCL 3.0
接着我们需要将建立软链接
root@NanoPC-T6:~# ln -s /usr/lib/aarch64-linux-gnu/libmali.so /usr/lib/aarch64-linux-gnu/libOpenCL.so root@NanoPC-T6:~# ls -l /usr/lib/aarch64-linux-gnu/libOpenCL.so lrwxrwxrwx 1 root root 37 1月 16 23:43 /usr/lib/aarch64-linux-gnu/libOpenCL.so -> /usr/lib/aarch64-linux-gnu/libmali.so
2.2.3 OpenCL SDK 编译安装
如果没有安装,请按照如下步骤安装:下载
下载源码:
root@NanoPC-T6:/opt# git clone --recursive https://github.com/KhronosGroup/OpenCL-SDK.git
运行以下命令来配置构建过程,并指定安装路径为
root@NanoPC-T6:/opt/OpenCL-SDK# cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/opt/OpenCL -- The C compiler identification is GNU 10.2.1 -- The CXX compiler identification is GNU 10.2.1 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- No build type selected, default to Release -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Looking for secure_getenv -- Looking for secure_getenv - found -- Looking for __secure_getenv -- Looking for __secure_getenv - not found -- Check if compiler accepts -pthread -- Check if compiler accepts -pthread - yes -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) -- cargs (https://521github.com/likle/cargs) not found. To self-host, set cargs_INCLUDE_PATH and cargs_LIBRARY to point to the headers and library respectively adding '-D cargs_INCLUDE_PATH=/path/to/cargs/include/dir -D cargs_LIBRARY/path/to/cargs/libcargs' to the cmake command. (missing: cargs_INCLUDE_PATH cargs_LIBRARY) -- Fetching cargs. -- Adding cargs subproject: /opt/OpenCL-SDK/build/_deps/cargs-external-src -- TCLAP (http://tclap.sourceforge.net/) not found. To self-host, set TCLAP_INCLUDE_PATH to point to the headers adding '-DTCLAP_INCLUDE_PATH=/path/to/tclap' to the cmake command. (missing: TCLAP_INCLUDE_PATH) -- Fetching TCLAP. -- Found TCLAP: /opt/OpenCL-SDK/build/_deps/tclap-external-src/include -- Stb (https://521github.com/nothings/stb) not found. To self-host, set Stb_INCLUDE_PATH to point to the headers adding '-D Stb_INCLUDE_PATH=/path/to/stb' to the cmake command. (missing: Stb_INCLUDE_PATH) -- Fetching Stb. -- Found Stb: /opt/OpenCL-SDK/build/_deps/stb-external-src -- Found X11: /usr/include -- Looking for XOpenDisplay in /usr/lib/aarch64-linux-gnu/libX11.so;/usr/lib/aarch64-linux-gnu/libXext.so -- Looking for XOpenDisplay in /usr/lib/aarch64-linux-gnu/libX11.so;/usr/lib/aarch64-linux-gnu/libXext.so - found -- Looking for gethostbyname -- Looking for gethostbyname - found -- Looking for connect -- Looking for connect - found -- Looking for remove -- Looking for remove - found -- Looking for shmat -- Looking for shmat - found -- Looking for IceConnectionNumber in ICE -- Looking for IceConnectionNumber in ICE - found -- Could NOT find glm (missing: glm_DIR) -- Fetching glm. -- Adding glm subproject: /opt/OpenCL-SDK/build/_deps/glm-external-src CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindOpenGL.cmake:305 (message): Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when available. Run "cmake --help-policy CMP0072" for policy details. Use the cmake_policy command to set the policy and suppress this warning. FindOpenGL found both a legacy GL library: OPENGL_gl_LIBRARY: /usr/lib/aarch64-linux-gnu/libGL.so and GLVND libraries for OpenGL and GLX: OPENGL_opengl_LIBRARY: /usr/lib/aarch64-linux-gnu/libOpenGL.so OPENGL_glx_LIBRARY: /usr/lib/aarch64-linux-gnu/libGLX.so OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for compatibility with CMake 3.10 and below the legacy GL library will be used. Call Stack (most recent call first): cmake/Dependencies/OpenGL/OpenGL.cmake:1 (find_package) cmake/Dependencies.cmake:17 (include) CMakeLists.txt:50 (include) This warning is for project developers. Use -Wno-dev to suppress it. -- Found OpenGL: /usr/lib/aarch64-linux-gnu/libOpenGL.so -- Could NOT find GLEW (missing: GLEW_INCLUDE_DIRS GLEW_LIBRARIES) -- Fetching GLEW. -- Adding GLEW subproject: /opt/OpenCL-SDK/build/_deps/glew-external-src CMake Warning (dev) at build/_deps/glew-external-src/CMakeLists.txt:2 (project): Policy CMP0048 is not set: project() command manages VERSION variables. Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy command to set the policy and suppress this warning. The following variable(s) would be set to empty: PROJECT_VERSION PROJECT_VERSION_MAJOR PROJECT_VERSION_MINOR PROJECT_VERSION_PATCH This warning is for project developers. Use -Wno-dev to suppress it. -- Found Freetype: /usr/lib/aarch64-linux-gnu/libfreetype.so (found version "2.10.4") -- Fetching SFML. -- Adding SFML subproject: /opt/OpenCL-SDK/build/_deps/sfml-external-src -- libudev stable: 1 -- Found UDev: /usr/lib/aarch64-linux-gnu/libudev.so -- include: /usr/include -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success -- Performing Test COMPILER_HAS_DEPRECATED_ATTR -- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success -- Looking for sin in m -- Looking for sin in m - found -- Configuring done -- Generating done -- Build files have been written to: /opt/OpenCL-SDK/build root@NanoPC-T6:/opt/OpenCL-SDK#
其中:
-S . :指定源代码目录的路径;-B build :指定构建目录的路径;-DCMAKE_INSTALL_PREFIX=/opt/OpenCL :指定cmake 执行install 目标时,安装的路径前缀;
接着运行以下命令在
root@NanoPC-T6:/opt/OpenCL-SDK# cmake --build build --target install
编译完成之后,我们查看安装目录:
root@NanoPC-T6:/opt/OpenCL-SDK# ls /opt/OpenCL -l 总用量 16 drwxr-xr-x 2 root root 4096 1月 16 20:44 bin drwxr-xr-x 5 root root 4096 1月 16 20:44 include # 头文件 drwxr-xr-x 4 root root 4096 1月 16 20:44 lib # 库文件 drwxr-xr-x 5 root root 4096 1月 16 20:44 share root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/lib 总用量 4564 drwxr-xr-x 4 root root 4096 1月 16 20:44 cmake -rw-r--r-- 1 root root 4842 1月 16 20:41 libcargs.a -rw-r--r-- 1 root root 1215506 1月 16 20:42 libglew.a lrwxrwxrwx 1 root root 23 1月 16 20:44 libglew-shared.so -> libglew-shared.so.2.2.0 -rw-r--r-- 1 root root 961392 1月 16 20:42 libglew-shared.so.2.2.0 -rw-r--r-- 1 root root 90550 1月 16 20:43 libOpenCLExt.a -rw-r--r-- 1 root root 1269528 1月 16 20:43 libOpenCLSDKCpp.so -rw-r--r-- 1 root root 205392 1月 16 20:43 libOpenCLSDK.so lrwxrwxrwx 1 root root 14 1月 16 20:44 libOpenCL.so -> libOpenCL.so.1 lrwxrwxrwx 1 root root 16 1月 16 20:44 libOpenCL.so.1 -> libOpenCL.so.1.2 -rw-r--r-- 1 root root 74744 1月 16 20:41 libOpenCL.so.1.2 -rw-r--r-- 1 root root 61152 1月 16 20:42 libOpenCLUtilsCpp.so -rw-r--r-- 1 root root 27096 1月 16 20:42 libOpenCLUtils.so lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-graphics.so -> libsfml-graphics.so.2.5 lrwxrwxrwx 1 root root 25 1月 16 20:44 libsfml-graphics.so.2.5 -> libsfml-graphics.so.2.5.1 -rw-r--r-- 1 root root 456128 1月 16 20:42 libsfml-graphics.so.2.5.1 lrwxrwxrwx 1 root root 21 1月 16 20:44 libsfml-system.so -> libsfml-system.so.2.5 lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-system.so.2.5 -> libsfml-system.so.2.5.1 -rw-r--r-- 1 root root 71592 1月 16 20:42 libsfml-system.so.2.5.1 lrwxrwxrwx 1 root root 21 1月 16 20:44 libsfml-window.so -> libsfml-window.so.2.5 lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-window.so.2.5 -> libsfml-window.so.2.5.1 -rw-r--r-- 1 root root 202536 1月 16 20:42 libsfml-window.so.2.5.1 drwxr-xr-x 2 root root 4096 1月 16 20:44 pkgconfig root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/include/ 总用量 20 -rw-r--r-- 1 root root 4553 1月 16 20:40 cargs.h drwxr-xr-x 3 root root 4096 1月 16 20:44 CL drwxr-xr-x 2 root root 4096 1月 16 20:44 GL drwxr-xr-x 7 root root 4096 1月 16 20:44 SFML root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/include/CL/ 总用量 788 -rw-r--r-- 1 root root 786 1月 16 20:38 cl2.hpp -rw-r--r-- 1 root root 8057 1月 16 20:38 cl_d3d10.h -rw-r--r-- 1 root root 8095 1月 16 20:38 cl_d3d11.h -rw-r--r-- 1 root root 12246 1月 16 20:38 cl_dx9_media_sharing.h -rw-r--r-- 1 root root 959 1月 16 20:38 cl_dx9_media_sharing_intel.h -rw-r--r-- 1 root root 5672 1月 16 20:38 cl_egl.h -rw-r--r-- 1 root root 127490 1月 16 20:38 cl_ext.h -rw-r--r-- 1 root root 902 1月 16 20:38 cl_ext_intel.h -rw-r--r-- 1 root root 33387 1月 16 20:38 cl_function_types.h -rw-r--r-- 1 root root 905 1月 16 20:38 cl_gl_ext.h -rw-r--r-- 1 root root 12040 1月 16 20:38 cl_gl.h -rw-r--r-- 1 root root 81631 1月 16 20:38 cl.h -rw-r--r-- 1 root root 10430 1月 16 20:38 cl_half.h -rw-r--r-- 1 root root 11505 1月 16 20:38 cl_icd.h -rw-r--r-- 1 root root 3544 1月 16 20:38 cl_layer.h -rw-r--r-- 1 root root 43430 1月 16 20:38 cl_platform.h -rw-r--r-- 1 root root 7090 1月 16 20:38 cl_va_api_media_sharing_intel.h -rw-r--r-- 1 root root 3125 1月 16 20:38 cl_version.h -rw-r--r-- 1 root root 970 1月 16 20:38 opencl.h -rw-r--r-- 1 root root 396735 1月 16 20:38 opencl.hpp drwxr-xr-x 2 root root 4096 1月 16 20:44 Utils
接着我们将库文件和头文件放置到
sudo ln -s /opt/OpenCL/include/CL /usr/include sudo ln -s /opt/OpenCL/include/GL /usr/include sudo ln -s /opt/OpenCL/include/SFML /usr/include sudo ln -s /opt/OptnCL/lib/libOpenCL.so /usr/lib
2.3 安装头文件
从官网下载头文件
root@NanoPC-T6:/opt# git clone https://github.com/extdomains/github.com/KhronosGroup/OpenCL-Headers.git
运行以下命令来配置构建过程,并指定安装路径为
root@NanoPC-T6:/opt/OpenCL-Headers# cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/usr -- The C compiler identification is GNU 10.2.1 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- The CXX compiler identification is GNU 10.2.1 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Python3: /usr/bin/python3.9 (found version "3.9.2") found components: Interpreter -- Configuring done -- Generating done -- Build files have been written to: /opt/OpenCL-Headers/build
其中:
-S . :指定源代码目录的路径;-B build :指定构建目录的路径;-DCMAKE_INSTALL_PREFIX=/usr :指定cmake 执行install 目标时,安装的路径前缀;
如上命令会让
接着运行以下命令在
root@NanoPC-T6:/opt/OpenCL-Headers# cmake --build build --target install Scanning dependencies of target headers_c_200 [ 0%] Building C object tests/lang_c/CMakeFiles/headers_c_200.dir/__/test_headers.c.o [ 0%] Linking C executable headers_c_200 [ 0%] Built target headers_c_200 Scanning dependencies of target headers_c_120 [ 1%] Building C object tests/lang_c/CMakeFiles/headers_c_120.dir/__/test_headers.c.o [ 1%] Linking C executable headers_c_120 [ 1%] Built target headers_c_120 Scanning dependencies of target cl_version_h_c_300 [ 1%] Building C object tests/lang_c/CMakeFiles/cl_version_h_c_300.dir/__/test_cl_version.h.c.o [ 2%] Linking C executable cl_version_h_c_300 ....... [ 99%] Built target cl_egl_h_cpp_100 Scanning dependencies of target cl_gl_h_cpp_120 [100%] Building CXX object tests/lang_cpp/CMakeFiles/cl_gl_h_cpp_120.dir/test_cl_gl.h.cpp.o [100%] Linking CXX executable cl_gl_h_cpp_120 [100%] Built target cl_gl_h_cpp_120 Install the project... -- Install configuration: "" -- Installing: /usr/include/CL -- Installing: /usr/include/CL/opencl.h -- Installing: /usr/include/CL/cl_egl.h -- Installing: /usr/include/CL/cl_ext_intel.h -- Installing: /usr/include/CL/cl_layer.h -- Installing: /usr/include/CL/cl_platform.h -- Installing: /usr/include/CL/cl_d3d10.h -- Installing: /usr/include/CL/cl_va_api_media_sharing_intel.h -- Installing: /usr/include/CL/cl_icd.h -- Installing: /usr/include/CL/cl.h -- Installing: /usr/include/CL/cl_function_types.h -- Installing: /usr/include/CL/cl_dx9_media_sharing.h -- Installing: /usr/include/CL/cl_dx9_media_sharing_intel.h -- Installing: /usr/include/CL/cl_gl_ext.h -- Installing: /usr/include/CL/cl_d3d11.h -- Installing: /usr/include/CL/cl_version.h -- Installing: /usr/include/CL/cl_half.h -- Installing: /usr/include/CL/cl_ext.h -- Installing: /usr/include/CL/cl_gl.h -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersTargets.cmake -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersConfig.cmake -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersConfigVersion.cmake -- Installing: /usr/share/pkgconfig/OpenCL-Headers.pc
头文件已经安装到
root@NanoPC-T6:/opt/OpenCL-Headers# ls -l /usr/include/CL 总用量 392 -rw-r--r-- 1 root root 8057 1月 15 00:10 cl_d3d10.h -rw-r--r-- 1 root root 8095 1月 15 00:10 cl_d3d11.h -rw-r--r-- 1 root root 12246 1月 15 00:10 cl_dx9_media_sharing.h -rw-r--r-- 1 root root 959 1月 15 00:10 cl_dx9_media_sharing_intel.h -rw-r--r-- 1 root root 5672 1月 15 00:10 cl_egl.h -rw-r--r-- 1 root root 127490 1月 15 00:10 cl_ext.h -rw-r--r-- 1 root root 902 1月 15 00:10 cl_ext_intel.h -rw-r--r-- 1 root root 33387 1月 15 00:10 cl_function_types.h -rw-r--r-- 1 root root 905 1月 15 00:10 cl_gl_ext.h -rw-r--r-- 1 root root 12040 1月 15 00:10 cl_gl.h -rw-r--r-- 1 root root 81631 1月 15 00:10 cl.h -rw-r--r-- 1 root root 10430 1月 15 00:10 cl_half.h -rw-r--r-- 1 root root 11505 1月 15 00:10 cl_icd.h -rw-r--r-- 1 root root 3544 1月 15 00:10 cl_layer.h -rw-r--r-- 1 root root 43430 1月 15 00:10 cl_platform.h -rw-r--r-- 1 root root 7090 1月 15 00:10 cl_va_api_media_sharing_intel.h -rw-r--r-- 1 root root 3125 1月 15 00:10 cl_version.h -rw-r--r-- 1 root root 970 1月 15 00:10 opencl.h
三、OpenCL 测试
此时已经有动态库和头文件,可以进行测试了。在
root@NanoPC-T6:/opt# mkdir opencl-project
接着创建
root@NanoPC-T6:/opt# cd opencl-project/ root@NanoPC-T6:/opt/opencl-project# mkdir platform root@NanoPC-T6:/opt/opencl-project# cd platform
3.1 platform.cpp
在
#include <stdio.h> #include <stdlib.h> #include <CL/cl.h> #define MAX_PLATFORMS 10 #define MAX_DEVICES 10 int main() { cl_platform_id platforms[MAX_PLATFORMS]; cl_device_id devices[MAX_DEVICES]; cl_uint num_platforms, num_devices; cl_context context; cl_command_queue command_queue; cl_program program; cl_kernel kernel; cl_int ret; // 获取平台数量 ret = clGetPlatformIDs(MAX_PLATFORMS, platforms, &num_platforms); if (ret != CL_SUCCESS) { printf("Failed to get platform IDs "); return -1; } printf("Number of platforms: %u ", num_platforms); // 遍历打印平台信息 for (cl_uint i = 0; i < num_platforms; i++) { char platform_name[128]; char platform_vendor[128]; ret = clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME, sizeof(platform_name), platform_name, NULL); if (ret != CL_SUCCESS) { printf("Failed to get platform name for platform %u ", i); } ret = clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(platform_vendor), platform_vendor, NULL); if (ret != CL_SUCCESS) { printf("Failed to get platform vendor for platform %u ", i); } printf("Platform %u: ", i); printf(" Name: %s ", platform_name); printf(" Vendor: %s ", platform_vendor); printf(" "); } // 获取设备数量 ret = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, MAX_DEVICES, devices, &num_devices); if (ret != CL_SUCCESS) { printf("Failed to get device IDs "); return -1; } // 创建OpenCL上下文 context = clCreateContext(NULL, num_devices, devices, NULL, NULL, &ret); if (ret != CL_SUCCESS) { printf("Failed to create context "); return -1; } // 创建命令队列 command_queue = clCreateCommandQueue(context, devices[0], 0, &ret); if (ret != CL_SUCCESS) { printf("Failed to create command queue "); return -1; } // 定义和构建OpenCL内核 const char *kernel_source = "__kernel void hello_world() { " " printf("Hello, World!\n"); " "} "; program = clCreateProgramWithSource(context, 1, &kernel_source, NULL, &ret); if (ret != CL_SUCCESS) { printf("Failed to create program "); return -1; } ret = clBuildProgram(program, num_devices, devices, NULL, NULL, NULL); if (ret != CL_SUCCESS) { printf("Failed to build program "); return -1; } // 创建OpenCL内核对象 kernel = clCreateKernel(program, "hello_world", &ret); if (ret != CL_SUCCESS) { printf("Failed to create kernel "); return -1; } // 执行内核函数 ret = clEnqueueTask(command_queue, kernel, 0, NULL, NULL); if (ret != CL_SUCCESS) { printf("Failed to enqueue task "); return -1; } // 等待执行完成 ret = clFinish(command_queue); if (ret != CL_SUCCESS) { printf("Failed to finish execution "); return -1; } printf("Kernel executed successfully "); // 清理资源 ret = clReleaseKernel(kernel); ret = clReleaseProgram(program); ret = clReleaseCommandQueue(command_queue); ret = clReleaseContext(context); return 0; }
3.2 编译
这里我们介绍两种源码编译的方式。
3.2.1 直接编译
我们可以直接执行如下编译命令:
root@NanoPC-T6:/opt/opencl-project/platform# gcc platform.cpp -o platform -lmali
那么编译器如何知道
- 首先搜索预定义的默认路径,如
/usr/lib 和/usr/local/lib 等; - 如果共享库没有在这些路径中找到,则会搜索在
/etc/ld.so.conf 和/etc/ld.so.conf.d 目录中指定的路径。这些路径可以包含自定义共享库路径,比如:
root@NanoPC-T6:/opt/opencl-project/platform# ls -l /etc/ld.so.conf.d/ 总用量 12 -rw-r--r-- 1 root root 32 7月 29 2020 00-aarch64-mali.conf -rw-r--r-- 1 root root 103 4月 20 2023 aarch64-linux-gnu.conf -rw-r--r-- 1 root root 44 9月 23 2022 libc.conf root@NanoPC-T6:/opt/opencl-project/platform# cat /etc/ld.so.conf.d/aarch64-linux-gnu.conf # Multiarch support /usr/local/lib/aarch64-linux-gnu /lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu # 该路径下有libmali.so库文件
3.2.2 cmake 编译
当然也可以使用
(1) 在
cmake_minimum_required(VERSION 3.0) cmake_policy(VERSION 3.0...3.18.4) project(proj) add_executable(platform platform.cpp) #寻找OpenCL库 /usr/share/cmake-3.18/Modules/FindOpenCL.cmake find_package(OpenCL REQUIRED) #打印调试信息 MESSAGE(STATUS "Project: ${PROJECT_NAME}") MESSAGE(STATUS "OpenCL library status:") MESSAGE(STATUS " version: ${OpenCL_VERSION_STRING}") MESSAGE(STATUS " libraries: ${OpenCL_LIBRARY}") MESSAGE(STATUS " include path: ${OpenCL_INCLUDE_DIR}") target_link_libraries(platform PRIVATE OpenCL::OpenCL)
(2) 配置构建过程:
root@NanoPC-T6:/opt/opencl-project/platform# cmake -S . -B build -- The C compiler identification is GNU 10.2.1 -- The CXX compiler identification is GNU 10.2.1 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for CL_VERSION_2_2 -- Looking for CL_VERSION_2_2 - found -- Found OpenCL: /usr/lib/aarch64-linux-gnu/libOpenCL.so (found version "2.2") -- Project: proj -- OpenCL library status: -- version: 2.2 -- libraries: /usr/lib/aarch64-linux-gnu/libOpenCL.so # 库文件路径 -- include path: /usr/include # 头文件路径 -- Configuring done -- Generating done -- Build files have been written to: /opt/opencl-project/platform/build
其中:
-S . :选项指定源代码目录的路径,CMake 将在该路径下查找CMakeLists.txt 文件;-B build :选项指定构建目录的路径;
实际上我们使用的版本是
可以通过修改
foreach(VERSION "3_0" "2_2" "2_1" "2_0" "1_2" "1_1" "1_0")
(3) 执行构建操作,生成可执行程序
root@NanoPC-T6:/opt/OpenCL-Headers/exmaples# cmake --build build Scanning dependencies of target platform [ 50%] Building CXX object CMakeFiles/platform.dir/platform.cpp.o In file included from /usr/include/CL/cl.h:20, from /usr/include/CL/opencl.h:24, from /opt/OpenCL-Headers/exmaples/platform.cpp:1: /usr/include/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)’ 22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)") | ^ [100%] Linking CXX executable platform [100%] Built target platform
执行程序:
root@NanoPC-T6:/opt/opencl-project/platform# ls -l build/ 总用量 48 -rw-r--r-- 1 root root 14229 1月 16 23:45 CMakeCache.txt drwxr-xr-x 5 root root 4096 1月 16 23:46 CMakeFiles -rw-r--r-- 1 root root 1632 1月 16 23:45 cmake_install.cmake -rw-r--r-- 1 root root 5253 1月 16 23:45 Makefile -rwxr-xr-x 1 root root 14248 1月 16 23:46 platform root@NanoPC-T6:/opt/opencl-project/platform# ./build/platform arm_release_ver: g13p0-01eac0, rk_so_ver: 10 Number of platforms: 1 Platform 0: Name: ARM Platform Vendor: ARM Kernel executed successfully
四、OpenCV 测试用例
在
4.1 OCL 介绍
4.2 项目源码
4.2.1 main.c
点击查看代码
/* // The example of interoperability between OpenCL and OpenCV. // This will loop through frames of video either from input media file // or camera device and do processing of these data in OpenCL and then // in OpenCV. In OpenCL it does inversion of pixels in left half of frame and // in OpenCV it does bluring in the right half of frame. */ #include <cstdio> #include <cstdlib> #include <iostream> #include <fstream> #include <string> #include <sstream> #include <iomanip> #include <stdexcept> #define CL_USE_DEPRECATED_OPENCL_2_0_APIS // eliminate build warning #if __APPLE__ #include <OpenCL/cl.h> #else #include <CL/cl.h> #endif #include <opencv2/core/ocl.hpp> #include <opencv2/core/utility.hpp> #include <opencv2/video.hpp> #include <opencv2/highgui.hpp> #include <opencv2/imgproc.hpp> using namespace std; using namespace cv; namespace opencl { class PlatformInfo { public: PlatformInfo() {} ~PlatformInfo() {} cl_int QueryInfo(cl_platform_id id) { query_param(id, CL_PLATFORM_PROFILE, m_profile); query_param(id, CL_PLATFORM_VERSION, m_version); query_param(id, CL_PLATFORM_NAME, m_name); query_param(id, CL_PLATFORM_VENDOR, m_vendor); query_param(id, CL_PLATFORM_EXTENSIONS, m_extensions); return CL_SUCCESS; } std::string Profile() { return m_profile; } std::string Version() { return m_version; } std::string Name() { return m_name; } std::string Vendor() { return m_vendor; } std::string Extensions() { return m_extensions; } private: cl_int query_param(cl_platform_id id, cl_platform_info param, std::string& paramStr) { cl_int res; size_t psize; cv::AutoBuffer<char> buf; res = clGetPlatformInfo(id, param, 0, 0, &psize); if (CL_SUCCESS != res) throw std::runtime_error(std::string("clGetPlatformInfo failed")); buf.resize(psize); res = clGetPlatformInfo(id, param, psize, buf, 0); if (CL_SUCCESS != res) throw std::runtime_error(std::string("clGetPlatformInfo failed")); // just in case, ensure trailing zero for ASCIIZ string buf[psize] = 0; paramStr = buf; return CL_SUCCESS; } private: std::string m_profile; std::string m_version; std::string m_name; std::string m_vendor; std::string m_extensions; }; class DeviceInfo { public: DeviceInfo() {} ~DeviceInfo() {} cl_int QueryInfo(cl_device_id id) { query_param(id, CL_DEVICE_TYPE, m_type); query_param(id, CL_DEVICE_VENDOR_ID, m_vendor_id); query_param(id, CL_DEVICE_MAX_COMPUTE_UNITS, m_max_compute_units); query_param(id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, m_max_work_item_dimensions); query_param(id, CL_DEVICE_MAX_WORK_ITEM_SIZES, m_max_work_item_sizes); query_param(id, CL_DEVICE_MAX_WORK_GROUP_SIZE, m_max_work_group_size); query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, m_preferred_vector_width_char); query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, m_preferred_vector_width_short); query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, m_preferred_vector_width_int); query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, m_preferred_vector_width_long); query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, m_preferred_vector_width_float); query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, m_preferred_vector_width_double); #if defined(CL_VERSION_1_1) query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, m_preferred_vector_width_half); query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, m_native_vector_width_char); query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, m_native_vector_width_short); query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, m_native_vector_width_int); query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, m_native_vector_width_long); query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, m_native_vector_width_float); query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, m_native_vector_width_double); query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, m_native_vector_width_half); #endif query_param(id, CL_DEVICE_MAX_CLOCK_FREQUENCY, m_max_clock_frequency); query_param(id, CL_DEVICE_ADDRESS_BITS, m_address_bits); query_param(id, CL_DEVICE_MAX_MEM_ALLOC_SIZE, m_max_mem_alloc_size); query_param(id, CL_DEVICE_IMAGE_SUPPORT, m_image_support); query_param(id, CL_DEVICE_MAX_READ_IMAGE_ARGS, m_max_read_image_args); query_param(id, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, m_max_write_image_args); #if defined(CL_VERSION_2_0) query_param(id, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, m_max_read_write_image_args); #endif query_param(id, CL_DEVICE_IMAGE2D_MAX_WIDTH, m_image2d_max_width); query_param(id, CL_DEVICE_IMAGE2D_MAX_HEIGHT, m_image2d_max_height); query_param(id, CL_DEVICE_IMAGE3D_MAX_WIDTH, m_image3d_max_width); query_param(id, CL_DEVICE_IMAGE3D_MAX_HEIGHT, m_image3d_max_height); query_param(id, CL_DEVICE_IMAGE3D_MAX_DEPTH, m_image3d_max_depth); #if defined(CL_VERSION_1_2) query_param(id, CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, m_image_max_buffer_size); query_param(id, CL_DEVICE_IMAGE_MAX_ARRAY_SIZE, m_image_max_array_size); #endif query_param(id, CL_DEVICE_MAX_SAMPLERS, m_max_samplers); #if defined(CL_VERSION_1_2) query_param(id, CL_DEVICE_IMAGE_PITCH_ALIGNMENT, m_image_pitch_alignment); query_param(id, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, m_image_base_address_alignment); #endif #if defined(CL_VERSION_2_0) query_param(id, CL_DEVICE_MAX_PIPE_ARGS, m_max_pipe_args); query_param(id, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, m_pipe_max_active_reservations); query_param(id, CL_DEVICE_PIPE_MAX_PACKET_SIZE, m_pipe_max_packet_size); #endif query_param(id, CL_DEVICE_MAX_PARAMETER_SIZE, m_max_parameter_size); query_param(id, CL_DEVICE_MEM_BASE_ADDR_ALIGN, m_mem_base_addr_align); query_param(id, CL_DEVICE_SINGLE_FP_CONFIG, m_single_fp_config); #if defined(CL_VERSION_1_2) query_param(id, CL_DEVICE_DOUBLE_FP_CONFIG, m_double_fp_config); #endif query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, m_global_mem_cache_type); query_param(id, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, m_global_mem_cacheline_size); query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, m_global_mem_cache_size); query_param(id, CL_DEVICE_GLOBAL_MEM_SIZE, m_global_mem_size); query_param(id, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, m_max_constant_buffer_size); query_param(id, CL_DEVICE_MAX_CONSTANT_ARGS, m_max_constant_args); #if defined(CL_VERSION_2_0) query_param(id, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, m_max_global_variable_size); query_param(id, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, m_global_variable_preferred_total_size); #endif query_param(id, CL_DEVICE_LOCAL_MEM_TYPE, m_local_mem_type); query_param(id, CL_DEVICE_LOCAL_MEM_SIZE, m_local_mem_size); query_param(id, CL_DEVICE_ERROR_CORRECTION_SUPPORT, m_error_correction_support); #if defined(CL_VERSION_1_1) query_param(id, CL_DEVICE_HOST_UNIFIED_MEMORY, m_host_unified_memory); #endif query_param(id, CL_DEVICE_PROFILING_TIMER_RESOLUTION, m_profiling_timer_resolution); query_param(id, CL_DEVICE_ENDIAN_LITTLE, m_endian_little); query_param(id, CL_DEVICE_AVAILABLE, m_available); query_param(id, CL_DEVICE_COMPILER_AVAILABLE, m_compiler_available); #if defined(CL_VERSION_1_2) query_param(id, CL_DEVICE_LINKER_AVAILABLE, m_linker_available); #endif query_param(id, CL_DEVICE_EXECUTION_CAPABILITIES, m_execution_capabilities); query_param(id, CL_DEVICE_QUEUE_PROPERTIES, m_queue_properties); #if defined(CL_VERSION_2_0) query_param(id, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, m_queue_on_host_properties); query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, m_queue_on_device_properties); query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, m_queue_on_device_preferred_size); query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, m_queue_on_device_max_size); query_param(id, CL_DEVICE_MAX_ON_DEVICE_QUEUES, m_max_on_device_queues); query_param(id, CL_DEVICE_MAX_ON_DEVICE_EVENTS, m_max_on_device_events); #endif #if defined(CL_VERSION_1_2) query_param(id, CL_DEVICE_BUILT_IN_KERNELS, m_built_in_kernels); #endif query_param(id, CL_DEVICE_PLATFORM, m_platform); query_param(id, CL_DEVICE_NAME, m_name); query_param(id, CL_DEVICE_VENDOR, m_vendor); query_param(id, CL_DRIVER_VERSION, m_driver_version); query_param(id, CL_DEVICE_PROFILE, m_profile); query_param(id, CL_DEVICE_VERSION, m_version); #if defined(CL_VERSION_1_1) query_param(id, CL_DEVICE_OPENCL_C_VERSION, m_opencl_c_version); #endif query_param(id, CL_DEVICE_EXTENSIONS, m_extensions); #if defined(CL_VERSION_1_2) query_param(id, CL_DEVICE_PRINTF_BUFFER_SIZE, m_printf_buffer_size); query_param(id, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, m_preferred_interop_user_sync); query_param(id, CL_DEVICE_PARENT_DEVICE, m_parent_device); query_param(id, CL_DEVICE_PARTITION_MAX_SUB_DEVICES, m_partition_max_sub_devices); query_param(id, CL_DEVICE_PARTITION_PROPERTIES, m_partition_properties); query_param(id, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, m_partition_affinity_domain); query_param(id, CL_DEVICE_PARTITION_TYPE, m_partition_type); query_param(id, CL_DEVICE_REFERENCE_COUNT, m_reference_count); #endif return CL_SUCCESS; } std::string Name() { return m_name; } private: template<typename T> cl_int query_param(cl_device_id id, cl_device_info param, T& value) { cl_int res; size_t size = 0; res = clGetDeviceInfo(id, param, 0, 0, &size); if (CL_SUCCESS != res && size != 0) throw std::runtime_error(std::string("clGetDeviceInfo failed")); if (0 == size) return CL_SUCCESS; if (sizeof(T) != size) throw std::runtime_error(std::string("clGetDeviceInfo: param size mismatch")); res = clGetDeviceInfo(id, param, size, &value, 0); if (CL_SUCCESS != res) throw std::runtime_error(std::string("clGetDeviceInfo failed")); return CL_SUCCESS; } template<typename T> cl_int query_param(cl_device_id id, cl_device_info param, std::vector<T>& value) { cl_int res; size_t size; res = clGetDeviceInfo(id, param, 0, 0, &size); if (CL_SUCCESS != res) throw std::runtime_error(std::string("clGetDeviceInfo failed")); if (0 == size) return CL_SUCCESS; value.resize(size / sizeof(T)); res = clGetDeviceInfo(id, param, size, &value[0], 0); if (CL_SUCCESS != res) throw std::runtime_error(std::string("clGetDeviceInfo failed")); return CL_SUCCESS; } cl_int query_param(cl_device_id id, cl_device_info param, std::string& value) { cl_int res; size_t size; res = clGetDeviceInfo(id, param, 0, 0, &size); if (CL_SUCCESS != res) throw std::runtime_error(std::string("clGetDeviceInfo failed")); value.resize(size + 1); res = clGetDeviceInfo(id, param, size, &value[0], 0); if (CL_SUCCESS != res) throw std::runtime_error(std::string("clGetDeviceInfo failed")); // just in case, ensure trailing zero for ASCIIZ string value[size] = 0; return CL_SUCCESS; } private: cl_device_type m_type; cl_uint m_vendor_id; cl_uint m_max_compute_units; cl_uint m_max_work_item_dimensions; std::vector<size_t> m_max_work_item_sizes; size_t m_max_work_group_size; cl_uint m_preferred_vector_width_char; cl_uint m_preferred_vector_width_short; cl_uint m_preferred_vector_width_int; cl_uint m_preferred_vector_width_long; cl_uint m_preferred_vector_width_float; cl_uint m_preferred_vector_width_double; #if defined(CL_VERSION_1_1) cl_uint m_preferred_vector_width_half; cl_uint m_native_vector_width_char; cl_uint m_native_vector_width_short; cl_uint m_native_vector_width_int; cl_uint m_native_vector_width_long; cl_uint m_native_vector_width_float; cl_uint m_native_vector_width_double; cl_uint m_native_vector_width_half; #endif cl_uint m_max_clock_frequency; cl_uint m_address_bits; cl_ulong m_max_mem_alloc_size; cl_bool m_image_support; cl_uint m_max_read_image_args; cl_uint m_max_write_image_args; #if defined(CL_VERSION_2_0) cl_uint m_max_read_write_image_args; #endif size_t m_image2d_max_width; size_t m_image2d_max_height; size_t m_image3d_max_width; size_t m_image3d_max_height; size_t m_image3d_max_depth; #if defined(CL_VERSION_1_2) size_t m_image_max_buffer_size; size_t m_image_max_array_size; #endif cl_uint m_max_samplers; #if defined(CL_VERSION_1_2) cl_uint m_image_pitch_alignment; cl_uint m_image_base_address_alignment; #endif #if defined(CL_VERSION_2_0) cl_uint m_max_pipe_args; cl_uint m_pipe_max_active_reservations; cl_uint m_pipe_max_packet_size; #endif size_t m_max_parameter_size; cl_uint m_mem_base_addr_align; cl_device_fp_config m_single_fp_config; #if defined(CL_VERSION_1_2) cl_device_fp_config m_double_fp_config; #endif cl_device_mem_cache_type m_global_mem_cache_type; cl_uint m_global_mem_cacheline_size; cl_ulong m_global_mem_cache_size; cl_ulong m_global_mem_size; cl_ulong m_max_constant_buffer_size; cl_uint m_max_constant_args; #if defined(CL_VERSION_2_0) size_t m_max_global_variable_size; size_t m_global_variable_preferred_total_size; #endif cl_device_local_mem_type m_local_mem_type; cl_ulong m_local_mem_size; cl_bool m_error_correction_support; #if defined(CL_VERSION_1_1) cl_bool m_host_unified_memory; #endif size_t m_profiling_timer_resolution; cl_bool m_endian_little; cl_bool m_available; cl_bool m_compiler_available; #if defined(CL_VERSION_1_2) cl_bool m_linker_available; #endif cl_device_exec_capabilities m_execution_capabilities; cl_command_queue_properties m_queue_properties; #if defined(CL_VERSION_2_0) cl_command_queue_properties m_queue_on_host_properties; cl_command_queue_properties m_queue_on_device_properties; cl_uint m_queue_on_device_preferred_size; cl_uint m_queue_on_device_max_size; cl_uint m_max_on_device_queues; cl_uint m_max_on_device_events; #endif #if defined(CL_VERSION_1_2) std::string m_built_in_kernels; #endif cl_platform_id m_platform; std::string m_name; std::string m_vendor; std::string m_driver_version; std::string m_profile; std::string m_version; #if defined(CL_VERSION_1_1) std::string m_opencl_c_version; #endif std::string m_extensions; #if defined(CL_VERSION_1_2) size_t m_printf_buffer_size; cl_bool m_preferred_interop_user_sync; cl_device_id m_parent_device; cl_uint m_partition_max_sub_devices; std::vector<cl_device_partition_property> m_partition_properties; cl_device_affinity_domain m_partition_affinity_domain; std::vector<cl_device_partition_property> m_partition_type; cl_uint m_reference_count; #endif }; } // namespace opencl class App { public: App(CommandLineParser& cmd); ~App(); int initOpenCL(); int initVideoSource(); int process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* cl_buffer); int process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u); int process_cl_image_with_opencv(cl_mem image, cv::UMat& u); int run(); bool isRunning() { return m_running; } bool doProcess() { return m_process; } bool useBuffer() { return m_use_buffer; } void setRunning(bool running) { m_running = running; } void setDoProcess(bool process) { m_process = process; } void setUseBuffer(bool use_buffer) { m_use_buffer = use_buffer; } protected: bool nextFrame(cv::Mat& frame) { return m_cap.read(frame); } void handleKey(char key); void timerStart(); void timerEnd(); std::string timeStr() const; std::string message() const; private: bool m_running; bool m_process; bool m_use_buffer; int64 m_t0; int64 m_t1; float m_time; float m_frequency; string m_file_name; int m_camera_id; cv::VideoCapture m_cap; cv::Mat m_frame; cv::Mat m_frameGray; opencl::PlatformInfo m_platformInfo; opencl::DeviceInfo m_deviceInfo; std::vector<cl_platform_id> m_platform_ids; cl_context m_context; cl_device_id m_device_id; cl_command_queue m_queue; cl_program m_program; cl_kernel m_kernelBuf; cl_kernel m_kernelImg; cl_mem m_img_src; // used as src in case processing of cl image cl_mem m_mem_obj; cl_event m_event; }; App::App(CommandLineParser& cmd) { cout << " Press ESC to exit " << endl; cout << " 'p' to toggle ON/OFF processing " << endl; cout << " SPACE to switch between OpenCL buffer/image " << endl; m_camera_id = cmd.get<int>("camera"); m_file_name = cmd.get<string>("video"); m_running = false; m_process = false; m_use_buffer = false; m_t0 = 0; m_t1 = 0; m_time = 0.0; m_frequency = (float)cv::getTickFrequency(); m_context = 0; m_device_id = 0; m_queue = 0; m_program = 0; m_kernelBuf = 0; m_kernelImg = 0; m_img_src = 0; m_mem_obj = 0; m_event = 0; } // ctor App::~App() { if (m_queue) { clFinish(m_queue); clReleaseCommandQueue(m_queue); m_queue = 0; } if (m_program) { clReleaseProgram(m_program); m_program = 0; } if (m_img_src) { clReleaseMemObject(m_img_src); m_img_src = 0; } if (m_mem_obj) { clReleaseMemObject(m_mem_obj); m_mem_obj = 0; } if (m_event) { clReleaseEvent(m_event); } if (m_kernelBuf) { clReleaseKernel(m_kernelBuf); m_kernelBuf = 0; } if (m_kernelImg) { clReleaseKernel(m_kernelImg); m_kernelImg = 0; } if (m_device_id) { clReleaseDevice(m_device_id); m_device_id = 0; } if (m_context) { clReleaseContext(m_context); m_context = 0; } } // dtor int App::initOpenCL() { cl_int res = CL_SUCCESS; cl_uint num_entries = 0; res = clGetPlatformIDs(0, 0, &num_entries); if (CL_SUCCESS != res) return -1; m_platform_ids.resize(num_entries); res = clGetPlatformIDs(num_entries, &m_platform_ids[0], 0); if (CL_SUCCESS != res) return -1; unsigned int i; // create context from first platform with GPU device for (i = 0; i < m_platform_ids.size(); i++) { cl_context_properties props[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(m_platform_ids[i]), 0 }; m_context = clCreateContextFromType(props, CL_DEVICE_TYPE_GPU, 0, 0, &res); if (0 == m_context || CL_SUCCESS != res) continue; res = clGetContextInfo(m_context, CL_CONTEXT_DEVICES, sizeof(cl_device_id), &m_device_id, 0); if (CL_SUCCESS != res) return -1; m_queue = clCreateCommandQueue(m_context, m_device_id, 0, &res); if (0 == m_queue || CL_SUCCESS != res) return -1; const char* kernelSrc = "__kernel " "void bitwise_inv_buf_8uC1(" " __global unsigned char* pSrcDst," " int srcDstStep," " int rows," " int cols)" "{" " int x = get_global_id(0);" " int y = get_global_id(1);" " int idx = mad24(y, srcDstStep, x);" " pSrcDst[idx] = ~pSrcDst[idx];" "}" "__kernel " "void bitwise_inv_img_8uC1(" " read_only image2d_t srcImg," " write_only image2d_t dstImg)" "{" " int x = get_global_id(0);" " int y = get_global_id(1);" " int2 coord = (int2)(x, y);" " uint4 val = read_imageui(srcImg, coord);" " val.x = (~val.x) & 0x000000FF;" " write_imageui(dstImg, coord, val);" "}"; size_t len = strlen(kernelSrc); m_program = clCreateProgramWithSource(m_context, 1, &kernelSrc, &len, &res); if (0 == m_program || CL_SUCCESS != res) return -1; res = clBuildProgram(m_program, 1, &m_device_id, 0, 0, 0); if (CL_SUCCESS != res) return -1; m_kernelBuf = clCreateKernel(m_program, "bitwise_inv_buf_8uC1", &res); if (0 == m_kernelBuf || CL_SUCCESS != res) return -1; m_kernelImg = clCreateKernel(m_program, "bitwise_inv_img_8uC1", &res); if (0 == m_kernelImg || CL_SUCCESS != res) return -1; m_platformInfo.QueryInfo(m_platform_ids[i]); m_deviceInfo.QueryInfo(m_device_id); // attach OpenCL context to OpenCV cv::ocl::attachContext(m_platformInfo.Name(), m_platform_ids[i], m_context, m_device_id); break; } return m_context != 0 ? CL_SUCCESS : -1; } // initOpenCL() int App::initVideoSource() { try { if (!m_file_name.empty() && m_camera_id == -1) { m_cap.open(m_file_name.c_str()); if (!m_cap.isOpened()) throw std::runtime_error(std::string("can't open video file: " + m_file_name)); } else if (m_camera_id != -1) { m_cap.open(m_camera_id); if (!m_cap.isOpened()) { std::stringstream msg; msg << "can't open camera: " << m_camera_id; throw std::runtime_error(msg.str()); } } else throw std::runtime_error(std::string("specify video source")); } catch (std::exception e) { cerr << "ERROR: " << e.what() << std::endl; return -1; } return 0; } // initVideoSource() // this function is an example of "typical" OpenCL processing pipeline // It creates OpenCL buffer or image, depending on use_buffer flag, // from input media frame and process these data // (inverts each pixel value in half of frame) with OpenCL kernel int App::process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* mem_obj) { cl_int res = CL_SUCCESS; CV_Assert(mem_obj); cl_kernel kernel = 0; cl_mem mem = mem_obj[0]; if (0 == mem || 0 == m_img_src) { // allocate/delete cl memory objects every frame for the simplicity. // in real applicaton more efficient pipeline can be built. if (use_buffer) { cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR; mem = clCreateBuffer(m_context, flags, frame.total(), frame.ptr(), &res); if (0 == mem || CL_SUCCESS != res) return -1; res = clSetKernelArg(m_kernelBuf, 0, sizeof(cl_mem), &mem); if (CL_SUCCESS != res) return -1; res = clSetKernelArg(m_kernelBuf, 1, sizeof(int), &frame.step[0]); if (CL_SUCCESS != res) return -1; res = clSetKernelArg(m_kernelBuf, 2, sizeof(int), &frame.rows); if (CL_SUCCESS != res) return -1; int cols2 = frame.cols / 2; res = clSetKernelArg(m_kernelBuf, 3, sizeof(int), &cols2); if (CL_SUCCESS != res) return -1; kernel = m_kernelBuf; } else { cl_mem_flags flags_src = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR; cl_image_format fmt; fmt.image_channel_order = CL_R; fmt.image_channel_data_type = CL_UNSIGNED_INT8; cl_image_desc desc_src; desc_src.image_type = CL_MEM_OBJECT_IMAGE2D; desc_src.image_width = frame.cols; desc_src.image_height = frame.rows; desc_src.image_depth = 0; desc_src.image_array_size = 0; desc_src.image_row_pitch = frame.step[0]; desc_src.image_slice_pitch = 0; desc_src.num_mip_levels = 0; desc_src.num_samples = 0; desc_src.buffer = 0; m_img_src = clCreateImage(m_context, flags_src, &fmt, &desc_src, frame.ptr(), &res); if (0 == m_img_src || CL_SUCCESS != res) return -1; cl_mem_flags flags_dst = CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR; cl_image_desc desc_dst; desc_dst.image_type = CL_MEM_OBJECT_IMAGE2D; desc_dst.image_width = frame.cols; desc_dst.image_height = frame.rows; desc_dst.image_depth = 0; desc_dst.image_array_size = 0; desc_dst.image_row_pitch = 0; desc_dst.image_slice_pitch = 0; desc_dst.num_mip_levels = 0; desc_dst.num_samples = 0; desc_dst.buffer = 0; mem = clCreateImage(m_context, flags_dst, &fmt, &desc_dst, 0, &res); if (0 == mem || CL_SUCCESS != res) return -1; size_t origin[] = { 0, 0, 0 }; size_t region[] = { (size_t)frame.cols, (size_t)frame.rows, 1 }; res = clEnqueueCopyImage(m_queue, m_img_src, mem, origin, origin, region, 0, 0, &m_event); if (CL_SUCCESS != res) return -1; res = clWaitForEvents(1, &m_event); if (CL_SUCCESS != res) return -1; res = clSetKernelArg(m_kernelImg, 0, sizeof(cl_mem), &m_img_src); if (CL_SUCCESS != res) return -1; res = clSetKernelArg(m_kernelImg, 1, sizeof(cl_mem), &mem); if (CL_SUCCESS != res) return -1; kernel = m_kernelImg; } } m_event = clCreateUserEvent(m_context, &res); if (0 == m_event || CL_SUCCESS != res) return -1; // process left half of frame in OpenCL size_t size[] = { (size_t)frame.cols / 2, (size_t)frame.rows }; res = clEnqueueNDRangeKernel(m_queue, kernel, 2, 0, size, 0, 0, 0, &m_event); if (CL_SUCCESS != res) return -1; res = clWaitForEvents(1, &m_event); if (CL_SUCCESS != res) return - 1; mem_obj[0] = mem; return 0; } // this function is an example of interoperability between OpenCL buffer // and OpenCV UMat objects. It converts (without copying data) OpenCL buffer // to OpenCV UMat and then do blur on these data int App::process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u) { cv::ocl::convertFromBuffer(buffer, step, rows, cols, type, u); // process right half of frame in OpenCV cv::Point pt(u.cols / 2, 0); cv::Size sz(u.cols / 2, u.rows); cv::Rect roi(pt, sz); cv::UMat uroi(u, roi); cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3)); if (buffer) clReleaseMemObject(buffer); m_mem_obj = 0; return 0; } // this function is an example of interoperability between OpenCL image // and OpenCV UMat objects. It converts OpenCL image // to OpenCV UMat and then do blur on these data int App::process_cl_image_with_opencv(cl_mem image, cv::UMat& u) { cv::ocl::convertFromImage(image, u); // process right half of frame in OpenCV cv::Point pt(u.cols / 2, 0); cv::Size sz(u.cols / 2, u.rows); cv::Rect roi(pt, sz); cv::UMat uroi(u, roi); cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3)); if (image) clReleaseMemObject(image); m_mem_obj = 0; if (m_img_src) clReleaseMemObject(m_img_src); m_img_src = 0; return 0; } int App::run() { if (0 != initOpenCL()) return -1; if (0 != initVideoSource()) return -1; Mat img_to_show; // set running state until ESC pressed setRunning(true); // set process flag to show some data processing // can be toggled on/off by 'p' button setDoProcess(true); // set use buffer flag, // when it is set to true, will demo interop opencl buffer and cv::Umat, // otherwise demo interop opencl image and cv::UMat // can be switched on/of by SPACE button setUseBuffer(true); // Iterate over all frames while (isRunning() && nextFrame(m_frame)) { cv::cvtColor(m_frame, m_frameGray, COLOR_BGR2GRAY); UMat uframe; // work timerStart(); if (doProcess()) { process_frame_with_open_cl(m_frameGray, useBuffer(), &m_mem_obj); if (useBuffer()) process_cl_buffer_with_opencv( m_mem_obj, m_frameGray.step[0], m_frameGray.rows, m_frameGray.cols, m_frameGray.type(), uframe); else process_cl_image_with_opencv(m_mem_obj, uframe); } else { m_frameGray.copyTo(uframe); } timerEnd(); uframe.copyTo(img_to_show); putText(img_to_show, "Version : " + m_platformInfo.Version(), Point(5, 30), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); putText(img_to_show, "Name : " + m_platformInfo.Name(), Point(5, 60), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); putText(img_to_show, "Device : " + m_deviceInfo.Name(), Point(5, 90), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); cv::String memtype = useBuffer() ? "buffer" : "image"; putText(img_to_show, "interop with OpenCL " + memtype, Point(5, 120), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); putText(img_to_show, "Time : " + timeStr() + " msec", Point(5, 150), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); imshow("opencl_interop", img_to_show); handleKey((char)waitKey(3)); } return 0; } void App::handleKey(char key) { switch (key) { case 27: setRunning(false); break; case ' ': setUseBuffer(!useBuffer()); break; case 'p': case 'P': setDoProcess( !doProcess() ); break; default: break; } } inline void App::timerStart() { m_t0 = getTickCount(); } inline void App::timerEnd() { m_t1 = getTickCount(); int64 delta = m_t1 - m_t0; m_time = (delta / m_frequency) * 1000; // units msec } inline string App::timeStr() const { stringstream ss; ss << std::fixed << std::setprecision(1) << m_time; return ss.str(); } int main(int argc, char** argv) { const char* keys = "{ help h ? | | print help message }" "{ camera c | -1 | use camera as input }" "{ video v | | use video as input }"; CommandLineParser cmd(argc, argv, keys); if (cmd.has("help")) { cmd.printMessage(); return EXIT_SUCCESS; } App app(cmd); try { app.run(); } catch (const cv::Exception& e) { cout << "error: " << e.what() << endl; return 1; } catch (const std::exception& e) { cout << "error: " << e.what() << endl; return 1; } catch (...) { cout << "unknown exception" << endl; return 1; } return EXIT_SUCCESS; } // main()
4.2.2 Makefile
TARGET = main CXX = g++ CFLAGS += -I/usr/include -I/usr/local/include/opencv -I/usr/local/include/opencv2 -L/usr/lib -L/usr/local/lib -L/lib -std=c++98 CFLAGS += -lopencv_core -lopencv_objdetect -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_imgproc -lOpenCL -lpthread -lrt all: @$(CXX) $(TARGET).cpp -o $(TARGET) $(CFLAGS) clean: rm -rf $(TARGET)
4.2.3 编译运行
root@NanoPC-T6:/opt/opencl-project/opencv-ocl# make root@NanoPC-T6:/opt/opencl-project/opencv-ocl# ./main -c
如下图所示:
参考文章
[1]
[2] 嵌入式
[3]
[4]
[5] 如何在
[6]
[7] 什么是
[8] 高性能计算
[9]
[10]
[11]
[12]
[13] 一、