Skip to content

Commit d5ddaba

Browse files
committed
Adds docs under docs/html
1 parent 5e020ae commit d5ddaba

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+29257
-0
lines changed

docs/html/.buildinfo

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sphinx build info version 1
2+
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3+
config: 5fc10d30154c5455d6afe27ae0859f6a
4+
tags: 645f666f9bcd5a90fca523b33c5a78b7

docs/html/_sources/almaif.rst.txt

+376
Large diffs are not rendered by default.

docs/html/_sources/android.rst.txt

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
.. _android-label:
2+
3+
Using PoCL on Android
4+
=====================
5+
6+
It is possible to build and use PoCL on Android. However, the setup requires a number of options to be set.
7+
To see an example project, have a look at the `PoCL-R Reference Android Java Client <https://github.com/cpc/PoCL-R-Reference-Android-Java-Client>`_ .
8+
This Reference app uses both the :ref:`proxy<proxy-label>` and :ref:`remote<remote-label>` device in its example apps. It also builds a custom version of `JOCL <http://jocl.org/>`_ so
9+
that PoCL can be used in Java instead of calling C code using the Java Native Interface (jni). These guidelines assume
10+
that Android studio is used as an IDE, but it should be possible to do something similar with a different IDE. It is also
11+
assumed that a recent enough version of the NDK and CMake (the one found in the SDK tools of Android Studio) have been
12+
installed via Android Studio. Versions that have been used before include: NDK 25.1.8937393 and 26.0.10792818 and CMake
13+
3.22.1.
14+
15+
CMake Arguments
16+
---------------
17+
18+
A number of features in PoCL such as CPU devices and the icd loader are not available on Android. Below is a list of
19+
recommended CMake options::
20+
21+
-DENABLE_LLVM=0 -DHOST_DEVICE_BUILD_HASH=00000000 -DENABLE_ICD=0 -DENABLE_LOADABLE_DRIVERS=0 -DENABLE_HOST_CPU_DEVICES=0 -DENABLE_HWLOC=0 -DENABLE_POCLCC=0 -DENABLE_TESTS=0 -DENABLE_EXAMPLES=0 -DBUILD_SHARED_LIBS=0 -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake -DANDROID_NDK=${ANDROID_NDK} -DANDROID_PLATFORM=${ANDROID_PLATFORM_LEVEL} -DANDROID_ABI=${ANDROID_ABI} -DANDROID_NATIVE_API_LEVEL=${ANDROID_PLATFORM_LEVEL}
22+
23+
It is recommended to Build PoCL as an external project in the CMakeLists.txt that belongs to the native code of the
24+
Android project. This will set the ``ANDROID_NDK, ANDROID_PLATFORM_LEVEL`` and ``ANDROID_ABI`` to what you are building the
25+
project for. By default, Android Studio will build native code for multiple architectures (ARM 32/64 and x86), so the
26+
``ANDROID_ABI`` will change for each architecture. Adding pocl as a library dependency to your native code will ensure that
27+
it is packed into the APK. It is recommended to set ``-DBUILD_SHARED_LIBS=0`` so that PoCL gets built as a static library
28+
(libpocl.a) as this is easier to use.
29+
30+
Building Remote Client
31+
----------------------
32+
33+
If you want to make use of PoCL-R, you can add ``-DENABLE_REMOTE_CLIENT=YES`` to the cmake options
34+
and make sure that network access is allowed in the `AndroidManifest.xml`.
35+
36+
37+
Building Proxy Device
38+
---------------------
39+
40+
The proxy device allow you make use of any system provided OpenCL implementation as well as any devices provided by PoCL
41+
at the same time. Combined with the remote device, this allows you for example to easily switch between executing kernels
42+
locally or remotely or create a pipeline where work is done on both devices at the same time. To make use of the Proxy
43+
device on Android, You first need to make sure that the phone comes with an OpenCL library and that is whitelisted by
44+
the vendor. Starting with API level 24, vendors need whitelist libraries that are allowed to be dlopened. To check that
45+
OpenCL is whitelisted do this:
46+
47+
1. adb into the phone
48+
2. run::
49+
50+
cat /vendor/etc/public.libraries.txt
51+
52+
3. check that `libOpenCL.so` is there
53+
54+
For newer Android versions (Android 12 and up), you also need to add::
55+
56+
<uses-native-library
57+
android:name="libOpenCL.so"
58+
android:required="false" />
59+
60+
to the ``<applications>`` element of the `AndroidManifest.xml`
61+
62+
Once you know that your phone comes with an OpenCL library, it's possible to use the proxy device. To build the proxy device add the
63+
following CMake options to the ones mentioned before: ``-DENABLE_PROXY_DEVICE=YES -DVISIBILITY_HIDDEN=NO``. This will build
64+
the proxy device and pocl as a static library. If you want to use JOCL, you need to also add ``-DPROXY_USE_LIBOPENCL_STUB=YES``
65+
and set ``-DBUILD_SHARED_LIBS=YES``. This will build a dynamic library of pocl.
66+
67+
*NOTE:* The proxy driver suffers from the same issues the remote driver has with :ref:`Mali GPUs<remote-issues-label>`.
68+
See that section for a workaround.
69+
70+
71+
Setting PoCL Environment Variables
72+
----------------------------------
73+
74+
The easiest way to set PoCL environment variables is to create a native function that calls stdlib.h's setenv function.
75+
76+
Using JOCL
77+
----------
78+
79+
It is possible to use JOCL on Android. However, by default JOCL does not get built for Android. It also doesn't look for libpocl.
80+
See the android reference client readme on how to build JOCL for android and a submodule to our JOCL repo that looks for
81+
`libpocl.so` on Android.
82+
+261
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
.. _pocl-conformance:
2+
3+
=======================
4+
OpenCL conformance
5+
=======================
6+
7+
Conformance related CMake options
8+
---------------------------------
9+
10+
- ``-DENABLE_CONFORMANCE=ON/OFF``
11+
Defaults to OFF. This option by itself does not guarantee OpenCL-conformant build;
12+
it merely ensures that a build fails if some CMake options which are known to result
13+
in non-conformant PoCL build are given. Only applies to CPU driver.
14+
15+
Changes when ENABLE_CONFORMANCE is ON, the CPU drivers are built
16+
with the following changes:
17+
18+
* read-write images are disabled (some 1D/2D image array tests fail)
19+
* the list of supported image formats is much smaller
20+
* SLEEF is always enforced for the builtin library
21+
* cl_khr_fp16 is disabled
22+
* cl_khr_subgroup_{ballot,shuffle} are disabled
23+
* cl_intel_subgroups,cl_intel_required_subgroup_size are disabled
24+
25+
If ENABLE_CONFORMANCE is OFF, and ENABLE_HOST_CPU_DEVICES is ON,
26+
the conformance testsuite is disabled in CMake. This is because
27+
some CTS tests will fail on such build.
28+
29+
Supported & Unsupported optional OpenCL 3.0 features
30+
------------------------------------------------------
31+
32+
This list is only related to CPU devices (cpu & cpu-minimal drivers).
33+
Other drivers (CUDA, TCE etc) only support OpenCL 1.2.
34+
Note that 3.0 support on CPU devices requires LLVM 14 or newer.
35+
36+
Supported 3.0 features:
37+
38+
* Shared Virtual Memory
39+
* C11 atomics
40+
* 3D Image Writes
41+
* SPIR-V
42+
* Program Scope Global Variables
43+
* Subgroups
44+
* Generic Address Space
45+
46+
Unsupported 3.0 features:
47+
48+
* Device-side enqueue
49+
* Pipes
50+
* Non-Uniform Work Groups
51+
* Read-Write Images
52+
* Creating 2D Images from Buffers
53+
* sRGB & Depth Images
54+
* Device and Host Timer Synchronization
55+
* Intermediate Language Programs
56+
* Program Initialization and Clean-Up Kernels
57+
* Work Group Collective Functions
58+
59+
.. _running-cts:
60+
61+
How to run the OpenCL 3.0 conformance test suite
62+
------------------------------------------------
63+
64+
You'll need to build PoCL with enabled ICD, and the ICD must be one that supports
65+
OpenCL version 3.0 (for ocl-icd, this is available since version 2.3.0).
66+
This is because while the CTS will run with 1.2 devices, it requires 3.0 headers
67+
and 3.0 ICD to build. You'll also need to enable the suite in the pocl's external test suite set.
68+
This is done by adding ``-DENABLE_TESTSUITES=conformance -DENABLE_CONFORMANCE=ON``
69+
to the cmake command line. After this ``make prepare_examples`` fetches and
70+
prepares the conformance suite for testing. After building pocl with ``make``,
71+
the CTS can be run with ``ctest -L <LABEL>`` where ``<LABEL>`` is a CTest label.
72+
73+
There are three different CTest labels for using CTS, one label covers the full
74+
set tests in CTS, the other two contain a smaller subset of CTS tests. The fastest
75+
is ``conformance_suite_micro_main`` label, which takes approx 10-30 minutes on
76+
current (desktop) hardware. The medium sized ``conformance_suite_mini_main``
77+
can take 1-2 hours on current hardware. The full sized CTS is available
78+
with label ``conformance_suite_full_main``. This can take 10-30 hrs on current
79+
hardware.
80+
81+
If PoCL is compiled with SPIR-V support, three more labels are available, where
82+
``_main`` suffix is replaced by ``_spirv`` (e.g. ``conformance_suite_mini_spirv``)
83+
These labels will run the same tests as the _main variant, but use offline
84+
compilation to produce SPIR-V and use that to create programs,
85+
instead of default creating from OpenCL C source.
86+
87+
Note that running ``ctest -L conformance_suite_micro`` will run *both* variants
88+
(the online and offline compilation) since the -L option takes a regexp.
89+
90+
Additionally, there is a new cmake label, ``conformance_30_only``
91+
to run tests which are only relevant to OpenCL 3.0.
92+
93+
CPU device version 1.2 should also work with CTS 3.0 (tests will be skipped).
94+
95+
.. _known-issues:
96+
97+
Known issues related to CTS
98+
---------------------------
99+
100+
- a few tests from ``basic/test_basic`` may fail / segfault because they
101+
request a huge amount of memory for buffers.
102+
103+
- some tests from ``relationals/test_relationals`` can fail with specific
104+
LLVM versions, this is an LLVM bug, fixed in LLVM 13.
105+
106+
- a few tests may run much faster if you limit the reported Global memory size
107+
with POCL_MEMORY_LIMIT env var. In particular, "kernel_image_methods" test
108+
with "max_images" argument.
109+
110+
- With LLVM 15 and 16, when running CTS with the offline compilation mode
111+
(= via SPIR-V), Clang + SPIR-V translator produce invalid
112+
SPIR-V for several tests. PoCL bugreport:
113+
`<https://github.com/pocl/pocl/issues/1232>`_
114+
Related Khronos issues:
115+
`<https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2008>`_
116+
`<https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2024>`_
117+
`<https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2025>`_
118+
119+
- Integer division by zero. OpenCL 1.2 specification requires that division by
120+
zero on integers results in undefined values, instead of raising exceptions.
121+
This requires pocl to install a handler of SIGFPE. The handler is per-process,
122+
but it checks the thread ID, so that it only ignores the error for the CPU
123+
driver threads, not the user program's threads. This might not work on every
124+
system. The handler can be disabled completely by setting the env variable
125+
POCL_SIGFPE_HANDLER to 0.
126+
Note that this is currently only relevant for x86(-64) + Linux, on all other
127+
systems this issue is not handled in any way (thus Pocl is likely
128+
non-conformant there).
129+
130+
- Many of ``native_`` and ``half_`` variants of kernel library functions are mapped
131+
to the "full" variants.
132+
133+
- clSetUserEventStatus() called with negative status. The Spec leaves the behaviour
134+
in this case as "implementation defined", and this part of pocl is
135+
only very lightly tested by the conformance tests. clSetUserEventStatus()
136+
called with CL_COMPLETE works as expected, and is heavily used by
137+
the conversions conformance test.
138+
139+
Conformance tests results (precision of builtin math library functions)
140+
-----------------------------------------------------------------------
141+
142+
Note that it's impossible to test double precision on the entire range,
143+
therefore the results may vary.
144+
145+
x86-64 CPU with AVX2+FMA, LLVM 4.0, tested on Nov 1, 2017
146+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
147+
148+
==================== ========================= ===========================================================
149+
NAME Worst ULP WHERE
150+
==================== ========================= ===========================================================
151+
add 0.00 {0x0p+0, 0x0p+0}
152+
addD 0.00 {0x0p+0, 0x0p+0}
153+
assignment 0.00 0x0p+0
154+
assignmentD 0.00 0x0p+0
155+
cbrt 0.50 -0x1.5629d2p+116
156+
cbrtD 0.59 0x1.0000000000136p+1022
157+
ceil 0.00 0x0p+0
158+
ceilD 0.00 0x0p+0
159+
copysign 0.00 {0x0p+0, 0x0p+0}
160+
copysignD 0.00 {0x0p+0, 0x0p+0}
161+
cos 2.37 0x1.1338ccp+20
162+
cosD 2.27 -0x1.d10000000074p+380
163+
cosh 2.41 -0x1.602166p+2
164+
coshD 1.43 -0x1.98000000003efp+5
165+
cospi 1.94 0x1.d73b56p-2
166+
cospiD 2.46 -0x1.adffffffffa91p-2
167+
divide 0.00 {0x0p+0, 0x0p+0}
168+
divideD 0.00 {0x0p+0, 0x0p+0}
169+
exp 0.95 -0x1.762532p+2
170+
expD 0.94 0x1.2f0000000023dp+7
171+
exp10 0.79 -0x1.309022p+5
172+
exp10D 0.64 -0x1.34ffffffffcc9p+8
173+
exp2 0.79 -0x1.fa3d0ep+6
174+
exp2D 0.75 -0x1.ff00000000417p+9
175+
expm1 1.00 -0x1.7a0002p-25
176+
expm1D 0.99 -0x1.26p+5
177+
fabs 0.00 0x0p+0
178+
fabsD 0.00 0x0p+0
179+
fdim 0.00 {0x0p+0, 0x0p+0}
180+
fdimD 0.00 {0x0p+0, 0x0p+0}
181+
floor 0.00 0x0p+0
182+
floorD 0.00 0x0p+0
183+
fma 0.00 {0x0p+0, 0x0p+0, 0x0p+0}
184+
fmaD 0.00 {0x0p+0, 0x0p+0, 0x0p+0}
185+
fmax 0.00 {0x0p+0, 0x0p+0}
186+
fmaxD 0.00 {0x0p+0, 0x0p+0}
187+
fmin 0.00 {0x0p+0, 0x0p+0}
188+
fminD 0.00 {0x0p+0, 0x0p+0}
189+
fmod 0.00 {0x0p+0, 0x0p+0}
190+
fmodD 0.00 {0x0p+0, 0x0p+0}
191+
fract { 0.00, 0.00} {0x0p+0, 0x0p+0}
192+
fractD { 0.00, 0.00} {0x0p+0, 0x0p+0}
193+
frexp { 0.00, 0} 0x0p+0
194+
frexpD { 0.00, 0} 0x0p+0
195+
hypot 1.93 {0x1.17c998p-127, -0x1.5fedb8p-127}
196+
hypotD 1.73 {0x1.5d2ebeed7663cp-1022, 0x1.67457048a2318p-1022}
197+
ldexp 0.00 {0x0p+0, 0}
198+
ldexpD 0.00 {0x0p+0, 0}
199+
log10 0.50 0x1.7fee2ep-1
200+
log10D 0.50 0x1.9100000000639p+1022
201+
log 0.63 0x1.7fcb3ep-1
202+
logD 0.75 0x1.7d00000000381p+0
203+
log1p 1.00 -0x1.fa0002p-126
204+
log1pD 1.00 -0x1.e000000000001p-1022
205+
log2 0.59 0x1.1107a2p+0
206+
log2D 0.72 0x1.120000000063dp+0
207+
logb 0.00 0x0p+0
208+
logbD 0.00 0x0p+0
209+
mad 0.00 {0x0p+0, 0x0p+0, 0x0p+0} no ULP check
210+
madD 0.00 {0x0p+0, 0x0p+0, 0x0p+0} no ULP check
211+
maxmag 0.00 {0x0p+0, 0x0p+0}
212+
maxmagD 0.00 {0x0p+0, 0x0p+0}
213+
minmag 0.00 {0x0p+0, 0x0p+0}
214+
minmagD 0.00 {0x0p+0, 0x0p+0}
215+
modf { 0.00, 0.00} {0x0p+0, 0x0p+0}
216+
modfD { 0.00, 0.00} {0x0p+0, 0x0p+0}
217+
multiply 0.00 {0x0p+0, 0x0p+0}
218+
multiplyD 0.00 {0x0p+0, 0x0p+0}
219+
nan 0.00 0x0p+0
220+
nanD 0.00 0x0p+0
221+
nextafter 0.00 {0x0p+0, 0x0p+0}
222+
nextafterD 0.00 {0x0p+0, 0x0p+0}
223+
pow 0.82 {0x1.91237cp-1, 0x1.4da146p+8}
224+
powD 0.80 {0x1.2bfb4b18164c9p+65, -0x1.b78438ae9c3bdp-8}
225+
pown 0.65 {-0x1.9p+6, -2}
226+
pownD 0.62 {-0x1.7ffffffffffffp+1, 3}
227+
powr 0.82 {0x1.91237cp-1, 0x1.4da146p+8}
228+
powrD 0.80 {0x1.2bfb4b18164c9p+65, -0x1.b78438ae9c3bdp-8}
229+
remainder 0.00 {0x0p+0, 0x0p+0}
230+
remainderD 0.00 {0x0p+0, 0x0p+0}
231+
remquo { 0.00, 0} 0x0p+0
232+
remquoD { 0.00, 0} 0x0p+0
233+
rint 0.00 0x0p+0
234+
rintD 0.00 0x0p+0
235+
rootn 0.69 {-0x1.e2fe6ep-74, -141}
236+
rootnD 0.68 {-0x1.8000000000001p+1, 3}
237+
round 0.00 0x0p+0
238+
roundD 0.00 0x0p+0
239+
rsqrt 1.49 0x1.019566p+124
240+
rsqrtD 1.49 0x1.01ffffffffa39p+1016
241+
sin 2.48 -0x1.09f07ap+21
242+
sinD 1.87 -0x1.f2fffffffffbap+32
243+
sincos { 2.48, 2.37} {0x1.09f07ap+21, 0x1.1338ccp+20}
244+
sincosD { 1.87, 2.27} {0x1.f2fffffffffbap+32, 0x1.d10000000074p+380}
245+
sinh 2.32 0x1.e76078p+2
246+
sinhD 1.53 -0x1.3100000000278p+4
247+
sinpi 2.13 -0x1.45f3ep-9
248+
sinpiD 2.50 -0x1.46000000000dap-7
249+
sqrt 0.00 0x0p+0
250+
sqrtD 0.00 0x0p+0
251+
subtract 0.00 {0x0p+0, 0x0p+0}
252+
subtractD 0.00 {0x0p+0, 0x0p+0}
253+
tan 4.35 -0x1.b4eba2p+22
254+
tanD 4.00 -0x1.2f000000003edp+333
255+
tanh 1.18 -0x1.ca742ap-1
256+
tanhD 1.19 0x1.f400000000395p-1
257+
tanpi 4.21 -0x1.f99d16p-3
258+
tanpiD 4.09 0x1.f6000000001d3p-3
259+
trunc 0.00 0x0p+0
260+
truncD 0.00 0x0p+0
261+
==================== ========================= ===========================================================

0 commit comments

Comments
 (0)