|
| 1 | +.. _pocl-conformance: |
| 2 | + |
| 3 | +======================= |
| 4 | +OpenCL conformance |
| 5 | +======================= |
| 6 | + |
| 7 | +Conformance related CMake options |
| 8 | +--------------------------------- |
| 9 | + |
| 10 | +- ``-DENABLE_CONFORMANCE=ON/OFF`` |
| 11 | + Defaults to OFF. This option by itself does not guarantee OpenCL-conformant build; |
| 12 | + it merely ensures that a build fails if some CMake options which are known to result |
| 13 | + in non-conformant PoCL build are given. Only applies to CPU driver. |
| 14 | + |
| 15 | + Changes when ENABLE_CONFORMANCE is ON, the CPU drivers are built |
| 16 | + with the following changes: |
| 17 | + |
| 18 | + * read-write images are disabled (some 1D/2D image array tests fail) |
| 19 | + * the list of supported image formats is much smaller |
| 20 | + * SLEEF is always enforced for the builtin library |
| 21 | + * cl_khr_fp16 is disabled |
| 22 | + * cl_khr_subgroup_{ballot,shuffle} are disabled |
| 23 | + * cl_intel_subgroups,cl_intel_required_subgroup_size are disabled |
| 24 | + |
| 25 | + If ENABLE_CONFORMANCE is OFF, and ENABLE_HOST_CPU_DEVICES is ON, |
| 26 | + the conformance testsuite is disabled in CMake. This is because |
| 27 | + some CTS tests will fail on such build. |
| 28 | + |
| 29 | +Supported & Unsupported optional OpenCL 3.0 features |
| 30 | +------------------------------------------------------ |
| 31 | + |
| 32 | +This list is only related to CPU devices (cpu & cpu-minimal drivers). |
| 33 | +Other drivers (CUDA, TCE etc) only support OpenCL 1.2. |
| 34 | +Note that 3.0 support on CPU devices requires LLVM 14 or newer. |
| 35 | + |
| 36 | +Supported 3.0 features: |
| 37 | + |
| 38 | + * Shared Virtual Memory |
| 39 | + * C11 atomics |
| 40 | + * 3D Image Writes |
| 41 | + * SPIR-V |
| 42 | + * Program Scope Global Variables |
| 43 | + * Subgroups |
| 44 | + * Generic Address Space |
| 45 | + |
| 46 | +Unsupported 3.0 features: |
| 47 | + |
| 48 | + * Device-side enqueue |
| 49 | + * Pipes |
| 50 | + * Non-Uniform Work Groups |
| 51 | + * Read-Write Images |
| 52 | + * Creating 2D Images from Buffers |
| 53 | + * sRGB & Depth Images |
| 54 | + * Device and Host Timer Synchronization |
| 55 | + * Intermediate Language Programs |
| 56 | + * Program Initialization and Clean-Up Kernels |
| 57 | + * Work Group Collective Functions |
| 58 | + |
| 59 | +.. _running-cts: |
| 60 | + |
| 61 | +How to run the OpenCL 3.0 conformance test suite |
| 62 | +------------------------------------------------ |
| 63 | + |
| 64 | +You'll need to build PoCL with enabled ICD, and the ICD must be one that supports |
| 65 | +OpenCL version 3.0 (for ocl-icd, this is available since version 2.3.0). |
| 66 | +This is because while the CTS will run with 1.2 devices, it requires 3.0 headers |
| 67 | +and 3.0 ICD to build. You'll also need to enable the suite in the pocl's external test suite set. |
| 68 | +This is done by adding ``-DENABLE_TESTSUITES=conformance -DENABLE_CONFORMANCE=ON`` |
| 69 | +to the cmake command line. After this ``make prepare_examples`` fetches and |
| 70 | +prepares the conformance suite for testing. After building pocl with ``make``, |
| 71 | +the CTS can be run with ``ctest -L <LABEL>`` where ``<LABEL>`` is a CTest label. |
| 72 | + |
| 73 | +There are three different CTest labels for using CTS, one label covers the full |
| 74 | +set tests in CTS, the other two contain a smaller subset of CTS tests. The fastest |
| 75 | +is ``conformance_suite_micro_main`` label, which takes approx 10-30 minutes on |
| 76 | +current (desktop) hardware. The medium sized ``conformance_suite_mini_main`` |
| 77 | +can take 1-2 hours on current hardware. The full sized CTS is available |
| 78 | +with label ``conformance_suite_full_main``. This can take 10-30 hrs on current |
| 79 | +hardware. |
| 80 | + |
| 81 | +If PoCL is compiled with SPIR-V support, three more labels are available, where |
| 82 | +``_main`` suffix is replaced by ``_spirv`` (e.g. ``conformance_suite_mini_spirv``) |
| 83 | +These labels will run the same tests as the _main variant, but use offline |
| 84 | +compilation to produce SPIR-V and use that to create programs, |
| 85 | +instead of default creating from OpenCL C source. |
| 86 | + |
| 87 | +Note that running ``ctest -L conformance_suite_micro`` will run *both* variants |
| 88 | +(the online and offline compilation) since the -L option takes a regexp. |
| 89 | + |
| 90 | +Additionally, there is a new cmake label, ``conformance_30_only`` |
| 91 | +to run tests which are only relevant to OpenCL 3.0. |
| 92 | + |
| 93 | +CPU device version 1.2 should also work with CTS 3.0 (tests will be skipped). |
| 94 | + |
| 95 | +.. _known-issues: |
| 96 | + |
| 97 | +Known issues related to CTS |
| 98 | +--------------------------- |
| 99 | + |
| 100 | +- a few tests from ``basic/test_basic`` may fail / segfault because they |
| 101 | + request a huge amount of memory for buffers. |
| 102 | + |
| 103 | +- some tests from ``relationals/test_relationals`` can fail with specific |
| 104 | + LLVM versions, this is an LLVM bug, fixed in LLVM 13. |
| 105 | + |
| 106 | +- a few tests may run much faster if you limit the reported Global memory size |
| 107 | + with POCL_MEMORY_LIMIT env var. In particular, "kernel_image_methods" test |
| 108 | + with "max_images" argument. |
| 109 | + |
| 110 | +- With LLVM 15 and 16, when running CTS with the offline compilation mode |
| 111 | + (= via SPIR-V), Clang + SPIR-V translator produce invalid |
| 112 | + SPIR-V for several tests. PoCL bugreport: |
| 113 | + `<https://github.com/pocl/pocl/issues/1232>`_ |
| 114 | + Related Khronos issues: |
| 115 | + `<https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2008>`_ |
| 116 | + `<https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2024>`_ |
| 117 | + `<https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2025>`_ |
| 118 | + |
| 119 | +- Integer division by zero. OpenCL 1.2 specification requires that division by |
| 120 | + zero on integers results in undefined values, instead of raising exceptions. |
| 121 | + This requires pocl to install a handler of SIGFPE. The handler is per-process, |
| 122 | + but it checks the thread ID, so that it only ignores the error for the CPU |
| 123 | + driver threads, not the user program's threads. This might not work on every |
| 124 | + system. The handler can be disabled completely by setting the env variable |
| 125 | + POCL_SIGFPE_HANDLER to 0. |
| 126 | + Note that this is currently only relevant for x86(-64) + Linux, on all other |
| 127 | + systems this issue is not handled in any way (thus Pocl is likely |
| 128 | + non-conformant there). |
| 129 | + |
| 130 | +- Many of ``native_`` and ``half_`` variants of kernel library functions are mapped |
| 131 | + to the "full" variants. |
| 132 | + |
| 133 | +- clSetUserEventStatus() called with negative status. The Spec leaves the behaviour |
| 134 | + in this case as "implementation defined", and this part of pocl is |
| 135 | + only very lightly tested by the conformance tests. clSetUserEventStatus() |
| 136 | + called with CL_COMPLETE works as expected, and is heavily used by |
| 137 | + the conversions conformance test. |
| 138 | + |
| 139 | +Conformance tests results (precision of builtin math library functions) |
| 140 | +----------------------------------------------------------------------- |
| 141 | + |
| 142 | +Note that it's impossible to test double precision on the entire range, |
| 143 | +therefore the results may vary. |
| 144 | + |
| 145 | +x86-64 CPU with AVX2+FMA, LLVM 4.0, tested on Nov 1, 2017 |
| 146 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 147 | + |
| 148 | +==================== ========================= =========================================================== |
| 149 | + NAME Worst ULP WHERE |
| 150 | +==================== ========================= =========================================================== |
| 151 | + add 0.00 {0x0p+0, 0x0p+0} |
| 152 | + addD 0.00 {0x0p+0, 0x0p+0} |
| 153 | + assignment 0.00 0x0p+0 |
| 154 | + assignmentD 0.00 0x0p+0 |
| 155 | + cbrt 0.50 -0x1.5629d2p+116 |
| 156 | + cbrtD 0.59 0x1.0000000000136p+1022 |
| 157 | + ceil 0.00 0x0p+0 |
| 158 | + ceilD 0.00 0x0p+0 |
| 159 | + copysign 0.00 {0x0p+0, 0x0p+0} |
| 160 | + copysignD 0.00 {0x0p+0, 0x0p+0} |
| 161 | + cos 2.37 0x1.1338ccp+20 |
| 162 | + cosD 2.27 -0x1.d10000000074p+380 |
| 163 | + cosh 2.41 -0x1.602166p+2 |
| 164 | + coshD 1.43 -0x1.98000000003efp+5 |
| 165 | + cospi 1.94 0x1.d73b56p-2 |
| 166 | + cospiD 2.46 -0x1.adffffffffa91p-2 |
| 167 | + divide 0.00 {0x0p+0, 0x0p+0} |
| 168 | + divideD 0.00 {0x0p+0, 0x0p+0} |
| 169 | + exp 0.95 -0x1.762532p+2 |
| 170 | + expD 0.94 0x1.2f0000000023dp+7 |
| 171 | + exp10 0.79 -0x1.309022p+5 |
| 172 | + exp10D 0.64 -0x1.34ffffffffcc9p+8 |
| 173 | + exp2 0.79 -0x1.fa3d0ep+6 |
| 174 | + exp2D 0.75 -0x1.ff00000000417p+9 |
| 175 | + expm1 1.00 -0x1.7a0002p-25 |
| 176 | + expm1D 0.99 -0x1.26p+5 |
| 177 | + fabs 0.00 0x0p+0 |
| 178 | + fabsD 0.00 0x0p+0 |
| 179 | + fdim 0.00 {0x0p+0, 0x0p+0} |
| 180 | + fdimD 0.00 {0x0p+0, 0x0p+0} |
| 181 | + floor 0.00 0x0p+0 |
| 182 | + floorD 0.00 0x0p+0 |
| 183 | + fma 0.00 {0x0p+0, 0x0p+0, 0x0p+0} |
| 184 | + fmaD 0.00 {0x0p+0, 0x0p+0, 0x0p+0} |
| 185 | + fmax 0.00 {0x0p+0, 0x0p+0} |
| 186 | + fmaxD 0.00 {0x0p+0, 0x0p+0} |
| 187 | + fmin 0.00 {0x0p+0, 0x0p+0} |
| 188 | + fminD 0.00 {0x0p+0, 0x0p+0} |
| 189 | + fmod 0.00 {0x0p+0, 0x0p+0} |
| 190 | + fmodD 0.00 {0x0p+0, 0x0p+0} |
| 191 | + fract { 0.00, 0.00} {0x0p+0, 0x0p+0} |
| 192 | + fractD { 0.00, 0.00} {0x0p+0, 0x0p+0} |
| 193 | + frexp { 0.00, 0} 0x0p+0 |
| 194 | + frexpD { 0.00, 0} 0x0p+0 |
| 195 | + hypot 1.93 {0x1.17c998p-127, -0x1.5fedb8p-127} |
| 196 | + hypotD 1.73 {0x1.5d2ebeed7663cp-1022, 0x1.67457048a2318p-1022} |
| 197 | + ldexp 0.00 {0x0p+0, 0} |
| 198 | + ldexpD 0.00 {0x0p+0, 0} |
| 199 | + log10 0.50 0x1.7fee2ep-1 |
| 200 | + log10D 0.50 0x1.9100000000639p+1022 |
| 201 | + log 0.63 0x1.7fcb3ep-1 |
| 202 | + logD 0.75 0x1.7d00000000381p+0 |
| 203 | + log1p 1.00 -0x1.fa0002p-126 |
| 204 | + log1pD 1.00 -0x1.e000000000001p-1022 |
| 205 | + log2 0.59 0x1.1107a2p+0 |
| 206 | + log2D 0.72 0x1.120000000063dp+0 |
| 207 | + logb 0.00 0x0p+0 |
| 208 | + logbD 0.00 0x0p+0 |
| 209 | + mad 0.00 {0x0p+0, 0x0p+0, 0x0p+0} no ULP check |
| 210 | + madD 0.00 {0x0p+0, 0x0p+0, 0x0p+0} no ULP check |
| 211 | + maxmag 0.00 {0x0p+0, 0x0p+0} |
| 212 | + maxmagD 0.00 {0x0p+0, 0x0p+0} |
| 213 | + minmag 0.00 {0x0p+0, 0x0p+0} |
| 214 | + minmagD 0.00 {0x0p+0, 0x0p+0} |
| 215 | + modf { 0.00, 0.00} {0x0p+0, 0x0p+0} |
| 216 | + modfD { 0.00, 0.00} {0x0p+0, 0x0p+0} |
| 217 | + multiply 0.00 {0x0p+0, 0x0p+0} |
| 218 | + multiplyD 0.00 {0x0p+0, 0x0p+0} |
| 219 | + nan 0.00 0x0p+0 |
| 220 | + nanD 0.00 0x0p+0 |
| 221 | + nextafter 0.00 {0x0p+0, 0x0p+0} |
| 222 | + nextafterD 0.00 {0x0p+0, 0x0p+0} |
| 223 | + pow 0.82 {0x1.91237cp-1, 0x1.4da146p+8} |
| 224 | + powD 0.80 {0x1.2bfb4b18164c9p+65, -0x1.b78438ae9c3bdp-8} |
| 225 | + pown 0.65 {-0x1.9p+6, -2} |
| 226 | + pownD 0.62 {-0x1.7ffffffffffffp+1, 3} |
| 227 | + powr 0.82 {0x1.91237cp-1, 0x1.4da146p+8} |
| 228 | + powrD 0.80 {0x1.2bfb4b18164c9p+65, -0x1.b78438ae9c3bdp-8} |
| 229 | + remainder 0.00 {0x0p+0, 0x0p+0} |
| 230 | + remainderD 0.00 {0x0p+0, 0x0p+0} |
| 231 | + remquo { 0.00, 0} 0x0p+0 |
| 232 | + remquoD { 0.00, 0} 0x0p+0 |
| 233 | + rint 0.00 0x0p+0 |
| 234 | + rintD 0.00 0x0p+0 |
| 235 | + rootn 0.69 {-0x1.e2fe6ep-74, -141} |
| 236 | + rootnD 0.68 {-0x1.8000000000001p+1, 3} |
| 237 | + round 0.00 0x0p+0 |
| 238 | + roundD 0.00 0x0p+0 |
| 239 | + rsqrt 1.49 0x1.019566p+124 |
| 240 | + rsqrtD 1.49 0x1.01ffffffffa39p+1016 |
| 241 | + sin 2.48 -0x1.09f07ap+21 |
| 242 | + sinD 1.87 -0x1.f2fffffffffbap+32 |
| 243 | + sincos { 2.48, 2.37} {0x1.09f07ap+21, 0x1.1338ccp+20} |
| 244 | + sincosD { 1.87, 2.27} {0x1.f2fffffffffbap+32, 0x1.d10000000074p+380} |
| 245 | + sinh 2.32 0x1.e76078p+2 |
| 246 | + sinhD 1.53 -0x1.3100000000278p+4 |
| 247 | + sinpi 2.13 -0x1.45f3ep-9 |
| 248 | + sinpiD 2.50 -0x1.46000000000dap-7 |
| 249 | + sqrt 0.00 0x0p+0 |
| 250 | + sqrtD 0.00 0x0p+0 |
| 251 | + subtract 0.00 {0x0p+0, 0x0p+0} |
| 252 | + subtractD 0.00 {0x0p+0, 0x0p+0} |
| 253 | + tan 4.35 -0x1.b4eba2p+22 |
| 254 | + tanD 4.00 -0x1.2f000000003edp+333 |
| 255 | + tanh 1.18 -0x1.ca742ap-1 |
| 256 | + tanhD 1.19 0x1.f400000000395p-1 |
| 257 | + tanpi 4.21 -0x1.f99d16p-3 |
| 258 | + tanpiD 4.09 0x1.f6000000001d3p-3 |
| 259 | + trunc 0.00 0x0p+0 |
| 260 | + truncD 0.00 0x0p+0 |
| 261 | +==================== ========================= =========================================================== |
0 commit comments