Skip to content

Commit c1be6bd

Browse files
authored
Docs updates (#60)
* Moving docs directory and created GitHub pages source
1 parent a9f6402 commit c1be6bd

File tree

211 files changed

+33910
-78
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

211 files changed

+33910
-78
lines changed

docs/.nojekyll

Whitespace-only changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

docs/_sources/creation.rst.txt

+227
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
Constructing Tensors
2+
====================
3+
4+
Basic construction of tensors in MatX is intended to be very simple with minimal parameters. This allows users of other languages
5+
to pick up the syntax quickly without understanding the underlying architecture. While using the simple API provides good performance,
6+
it lacks flexibility and can prevent your code from running at the highest performance possible. This document walks through the
7+
different ways to construct tensors, and when you should use certain methods over others.
8+
9+
A Quick Primer On MatX Types
10+
----------------------------
11+
The basic type of tensor used in most examples and tests is the ``tensor_t`` object. ``tensor_t`` is the highest-level tensor class, and
12+
provides all of the abstractions for viewing and modifying data, holding storage, and any other metadata needed by a tensor. Because of
13+
their relatively large size, ``tensor_t`` objects are not meant to be passed to GPU devices. In fact, doing so will lead to a compiler Error
14+
since ``tensor_t`` uses types that are not available on the device at this time.
15+
16+
Within a ``tensor_t`` there is an abstract object called ``Storage`` (more on that later), and another inherited class called ``tensor_impl_t``.
17+
``tensor_impl_t`` is a lightweight class containing only the minimum amount of member variables needed to access the data from a GPU kernel. Currently the
18+
member variables are a tensor descriptor and a data pointer. Tensor descriptors will be covered later in this document.
19+
20+
``tensor_impl_t`` also includes member functions for accessing and modifying the tensor. Examples are all ``operator()`` functions
21+
(both const and non-const), helper functions for the shape (``Size()`` and ``Stride()``), and utilities for printing on the host. ``tensor_impl_t``
22+
is the type that is passed into GPU kernels, and only contains types that are compatible with CUDA. Furthermore, the total size of the ``tensor_impl_t``
23+
object is as small as possible since these objects can be replicated many times within a single complex expression. Reducing the size of
24+
``tensor_impl_t`` allows for fastest memory accesses, smaller copies before a kernel launch, and makes extending the code easier.
25+
26+
To convert between a ``tensor_t`` and ``tensor_impl_t`` a type trait called ``base_type`` is available and is used like the follow:
27+
28+
.. code-block:: cpp
29+
30+
typename base_type<I1>::type in1_ = in;
31+
32+
where ``in`` is the ``tensor_t`` object and ``in1_`` will be a ``tensor_impl_t``.
33+
34+
MatX Storage
35+
------------
36+
Within the ``tensor_t`` class is an abstract template parameter called ``Storage``. ``Storage`` objects are always created from a ``basic_storage``
37+
class, which provides all accessor functions common to the underlying storage. ``basic_storage`` can wrap raw pointers using the ``raw_pointer_buffer``
38+
class, smart pointers using the ``smart_pointer_buffer`` class, or any RAII object that provides the required interface. If no user-defined storage
39+
is passed in, MatX will default to allocating a raw CUDA managed memory pointer, and back it using a ``shared_ptr`` for garbage collection.
40+
41+
When not using implicitly-allocated memory, the user is free to define the storage container type, allocator, and ownership semantics. The container
42+
type requires const and non-const iterators, an allocate function (when applicable), a ``data()`` function to get the raw pointer, and a way to get
43+
the size. Currently both ``std::array`` and ``std::vector`` from the STL follow these semantics, as do both the raw and smart pointer MatX containers.
44+
45+
The allocator type is used when the user passes in a shape without a pointer to existing data. By default, the allocator will use ``matx_allocator``,
46+
which is a PMR-compatible allocator with stream semantics. The allocator is used for both allocation and deallocation when no user-provided pointer
47+
is passed in and ownership semantics are requested. If a pointer is provided, only the deallocator is used when ownership semantics have been requested.
48+
49+
In general, creating a tensor allows you to choose ownership semantics with creation. By using the ``owning`` type, MatX will take ownership of the pointer
50+
and deallocate memory when the last tensor using the memory goes out of scope. By using the ``non_owning`` type, MatX will use the pointer, but not
51+
perform any reference counting or deallocations when out of scope.
52+
53+
Tensor Descriptors
54+
------------------
55+
Tensor descriptors are a template type inside ``tensor_impl_t`` that provide information about the size and strides of the tensor. While descriptors
56+
are a simply concept, the implementation can have a large impact on performance if not tuned properly. Both the sizes and strides of the tensor are
57+
a template class supporting iterators to access the metadata directly, and utility functions for accessing and computing other values from the metadata.
58+
Descriptors are commonly stored as ``std::array`` types given its compile-time features, but any class meeting the accessor properties can be used.
59+
60+
Dynamic Descriptors
61+
###################
62+
Dynamic descriptors use storage in memory to describe the shapes and strides of a tensor. They can have lower performance than static descriptors
63+
since more memory accesses and offset calculations are needed when accessing tensors, but have higher flexibility given the data is only needed at runtime.
64+
65+
Dynamic descriptors should be used when either the sizes are not known at compile-time, or when interoperating with existing code. As mentioned in the
66+
introduction, the descriptor size is very important for both kernel performance and launch time. For this reason, the data types used to store both the
67+
shape and size can vary depending on the size of the tensor parameters. While shape and stride storage types must match in length, the underlying types
68+
used to store them can be different. This is useful in scenarios where the shape can be expressed as a smaller type than the strides.
69+
70+
Static Descriptors
71+
##################
72+
If the shapes and strides are known at compile time, static descriptors should be used. Static descriptors compute and store the shape and strides in
73+
constexpr variables, and provide constexpr functions to access both values. When used in a GPU kernel, calling either ``Size()`` or ``Stride()`` emits
74+
an immediate rvalue that the compiler can use for address calculations. This removes all loads and complex pointer arithmetic that could affect the
75+
runtime of a kernel
76+
77+
78+
Creating Tensors
79+
----------------
80+
With the tensor terminology out of the way, it's time to discuss how to create tensors. If there's one thing to take from this article, it's that you
81+
should use ``make_tensor`` or ``make_static_tensor`` wherever possible.
82+
83+
.. note::
84+
Prefer ``make_tensor`` or ``make_static_tensor`` over constructing tensors directly
85+
86+
Using these helper functions has many benefits:
87+
88+
- They remove the need to specify the rank of a tensor in the template parameters
89+
- They abstract away many of the complex template types of creating a tensor directly
90+
- They hide potentially irrelevant types from the user
91+
92+
All ``make_``-style functions return a ``tensor_t`` object with the template parameters deduced or created as part of the input arguments. ``tensor_t``
93+
only has two required template parameters (type and rank). For simple cases where only implicitly-allocated memory is needed, the default constructor
94+
will suffice. Some situations prevent using the ``make_`` functions, such as when a tensor variable is a class member variable. In this case the type of
95+
the member variable must be specified in the member list. In these scenaries it's expected that the user knows what they are doing and can handle
96+
spelling out the types themselves. For examples of this, see the simple_pipeline files.
97+
98+
All make functions take the data type as the first template parameter.
99+
100+
Make variants
101+
#############
102+
There are currently 4 different variants of the ``make_`` helper functions:
103+
- ``make_`` for creating a tensor with a dynamic descriptor and returning by value
104+
- ``make_static_`` for creating a tensor with a static descriptor and returning by value
105+
- ``make_X_p`` for creating a tensor with a dynamic descriptor and returning a pointer
106+
- ``make_static_X_p`` for creating a tensor with a static descriptor and returning a pointer
107+
108+
The ``_p`` variants return pointers allocated with `new` and are expected to be deleted by the caller when finished. Returning smart pointers would
109+
have made this easier, but some users have their own smart pointer wrapper and wouldn't want to unpack the standard library versions.
110+
111+
Within each of these types, there are usually versions both with and without user-defined pointers. These forms are used when an existing device pointer
112+
is passed to MatX rather than having the allocation done when the tensor is created.
113+
114+
Each of these 4 variants can be used with all of the construction types when applicable.
115+
116+
Creating From C Array Or a Brace-Enclosed list
117+
##############################################
118+
Tensors can be created using a C-style shape array from an lvalue, or a brace-enclosed list as an rvalue. The following call the same ``make_`` call:
119+
120+
.. code-block:: cpp
121+
122+
int array[3] = {10, 20, 30};
123+
auto t = make_tensor<float>(array);
124+
125+
and
126+
127+
.. code-block:: cpp
128+
129+
auto t = make_tensor<float>({10, 20, 30});
130+
131+
In the former case the array is an lvalue that can be modified in memory before calling, whereas the latter case uses rvalues. When the sizes are known
132+
at compile time the static version of ``make_`` should be used:
133+
134+
.. code-block:: cpp
135+
136+
auto t = make_static_tensor<float, 10, 20, 30>();
137+
138+
Notice the sizes are now template parameters instead of function parameters. Both ways can be used interchangeable in MatX code, but the static version
139+
can lead to higher performance.
140+
141+
Similarly, all variants can be called with a user-defined pointer:
142+
143+
.. code-block:: cpp
144+
145+
auto t = make_tensor<float>(ptr, {10, 20, 30}); // ptr is a valid device pointer
146+
147+
All cases shown above use the default stride parameters. If the strides are not linear in memory, they can be passed in as well:
148+
149+
.. code-block:: cpp
150+
151+
int shape[3] = {10, 20, 30};
152+
int strides[3] = {1200, 60, 2};
153+
auto t = make_tensor<float>(shape, strides);
154+
155+
Creating From A Conforming Shape
156+
################################
157+
As mentioned in the descriptor section, any type that conforms to the shape semantics can be used inside of a descriptor, and can also be passed into the
158+
``make_`` functions:
159+
160+
.. code-block:: cpp
161+
162+
std::array<int, 3> = {10, 20, 30};
163+
auto t = make_tensor<float>(array);
164+
165+
Creating From A Descriptor
166+
##########################
167+
Descriptors (both shapes and sizes) can be used to construct tensors. This is useful when taking an existing tensor descriptor and creating a new tensor from it:
168+
169+
.. code-block:: cpp
170+
171+
auto d = existingTensor.Descriptor();
172+
auto t = make_tensor<float>(d);
173+
174+
``t`` is now a tensor with the same shapes and strides of ``existingTensor``.
175+
176+
0-D Tensors
177+
###########
178+
0-D tensors are different than higher ranks since they have no meaningful shape or strides, and therefor don't need those parameters. Empty versions of the
179+
``make_`` helpers existing to create these:
180+
181+
.. code-block:: cpp
182+
183+
auto t0 = make_tensor<float>();
184+
auto t01 = make_tensor<float>(ptr);
185+
186+
Custom Storage, Descriptors, and Allocators
187+
###########################################
188+
Within most of the ``make_`` functions, there are choices in the template parameters for custom storage, descriptor, and allocator types.
189+
190+
Storage
191+
-------
192+
Storage types can be created by wrapping a container object in the ``basic_storage`` class. MatX has a container type built-in for both raw pointers and smart
193+
pointers, but this can be extended to use any conforming container type. The ``basic_storage`` class does not know about any underlying data structures or ownership;
194+
this is encapsulated inside of the template type ``C``. For example, to create a custom storage object to wrap a raw pointer:
195+
196+
.. code-block:: cpp
197+
198+
raw_pointer_buffer<T, owning, matx_allocator<T>> rp{ptr, static_cast<size_t>(desc.TotalSize()*sizeof(T))};
199+
basic_storage<decltype(rp)> s{std::move(rp)};
200+
201+
The code above creates a new ``raw_pointer_buffer`` object with ownership semantics and the ``matx_allocator`` allocator. A constructor taking a pointer and a
202+
size will not allocate any new data, but track the pointer internally using a smart pointer. If instead ``non_owning`` had been passed as a template parameter, the
203+
pointer would not be tracked or freed. With the container created, the next line passes the container into a ``basic_storage`` object for use inside ``tensor_t``.
204+
205+
Descriptors
206+
-----------
207+
Creating a descriptor can be done by using any conforming descriptor type (See descriptor explanation above). Within MatX, ``std::array`` is used by default
208+
when creating dynamic descriptors. Because of the variable size of the stride and shape, MatX provides helper types for creating descriptors of common types:
209+
210+
- ``tensor_desc_cr_disi_dist<RANK>`` for a dynamic descriptor with ``index_t`` strides and shapes. This is the default descriptor and can also be creating using the type
211+
``DefaultDescriptor``. ``index_t`` is defined at compile-time, and defaults to 64-bit
212+
- ``tensor_desc_cr_ds_t<ShapeType, StrideType, RANK>`` a ``std::array``-based descriptor with user-provided types
213+
- ``tensor_desc_cr_ds_32_32_t<RANK>`` is a descriptor with 32-bit sizes and strides
214+
- ``tensor_desc_cr_ds_64_64_t<RANK>`` is a descriptor with 64-bit sizes and strides
215+
- ``tensor_desc_cr_ds_32_64_t<RANK>`` is a descriptor with 32-bit sizes and 64-bit strides
216+
- ``static_tensor_desc_t<size_t I, Size_t Is...>`` is a static-sized descriptor with the shape and stride created at compile time
217+
218+
To create a descriptor:
219+
220+
.. code-block:: cpp
221+
222+
const index_t arr[3] = {10, 20, 30};
223+
DefaultDescriptor<RANK> desc{arr};
224+
225+
In this case we create a default descriptor (based on ``index_t`` sizes) using a C-style array.
226+
227+
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)