In this section, we introduce how to map new algorithms to our framework with L2 level APIs provided by ThunderGP. Based on the execution flow of ThunderGP, users need to customize two compute kernels: the scatter-gather and the apply. The L2 functions act as hooks, meaning that ThunderGP will call them for processing specific event in the data-flow. The below figure shows the mapping hooks with unique IDs.
In the next, we shall enumerate the coding of hooks with an Sparse Matrix-vector Multiplication (SpMV) example.
This figure illustrates a SpMV algorithm. The matrix A in CSR format can be mapped into the edges and the property of edges, and the vector x is the property of vertices. The main compuation of SpMV is shown below:
for (r = 0; r < A.rows; r++)
{
for (i = A.rowStart[r]; i < A.rowStart[r + 1]; i++)
{
y[r] += A.val[i] * x[A.col[i]];
}
}
In our implementation, the gather and scatter stages are combined together by the internal shuffle data path(the shuffle stage), and the implementation of this compute kernel are highly hardware-specific, which is not friendly with the algorithm developers, and it is hiden by following hooks:
Hooks | ID | Description |
---|---|---|
preprocessProperty | 1 | Per-process the source vertex property. |
scatterFunc | 2 | Calculate the update value by using the edge property and source vertex property. |
updateMergeInRAWSolver | 3 | Destination property update in RAW solver. |
gatherFunc | 4 | Destination property update. |
In SpMV, these hooks in Scatter-Gather can be easily instanced as follows:
/* directly pass the value, do not need in SpMV */
inline prop_t preprocessProperty(prop_t srcProp)
{
return (srcProp);
}
/* calculate the A.val[i] * x[A.col[i]]; */
inline prop_t scatterFunc(prop_t srcProp, prop_t edgeProp)
{
return ((srcProp) * (edgeProp));
}
/* accumulate y[r] in rawSolver */
inline prop_t updateMergeInRAWSolver(prop_t ori, prop_t update)
{
return ((ori) + (update));
}
/* accumulate y[r] in on-chip memory */
inline prop_t gatherFunc(prop_t ori, prop_t update)
{
return ((ori) + (update));
}
Notes:
- The operation in
gatherFunc
andupdateMergeInRAWSolver
can be very similar, the difference is that forgatherFunc
, the data in on-chip memory is not initialized, which means the accumulations are start from zero, but forupdateMergeInRAWSolver
, as the read-after-write hazard only happened in two very closed update tuples, we add a temporary buffer in RAWSolver to sift out the closed update and perform the update in the temporary buffer. Therefore if the update happened, the data in the temporary buffer is initialized. It should be considered when mapping an algorithm with bit-masked property(e.g. BFS). - If the accelerator configuration
HAVE_EDGE_PROP
is set to false the hookscatterFunc
will not be called because there is no property of edges.
The apply stage may need different types of data, and this variance makes the abstraction a little tough. ThunderGP provides a flexible mechanism for mapping algorithms to apply phase. Same as mapping scatter-gather kernel, users can use the L2 hooks. However, the number of date types is limited by existing data path. Currently, ThunderGP support automatically load four types of data for the application-specific calculation in apply stage:
- The property of vertices;
- The out-degree of vertices;
- The update value from scatter-gather stage;
- An additional big word;
(If you need more types of data in apply stage, please go to the bellowing Customizating Apply section.)
Following table shows the hook functions for apply stage. Note: applyMerge
is only used in multiple-SLRs.
Hooks | ID | Description |
---|---|---|
applyMerge | 5 | Destination property merge from all of the scatter-gather CUs. |
applyFunc | 6 | Calculate the new property of vertices. |
In SpMV, these hooks in Apply can be easily instanced as follows:
/* accumulate y[r] among CUs */
inline prop_t applyMerge(prop_t ori, prop_t update)
{
return ((ori) + (update));
}
/* no other calculation in apply phase */
inline prop_t applyFunc( prop_t tProp,
prop_t source,
prop_t outDeg,
unsigned int &extra,
unsigned int arg
)
{
return tProp;
}
Developers also need to modify the accelerator configurations to fit their algorithm. This configurations locate in the build.mk
in each application folder.
Configuration | Value | Description |
---|---|---|
HAVE_VERTEX_ACTIVE_BIT | true/false | This is a bit-masked mechanism which control the tuple is going to update the property of corresponding vertex or not, it is used in BFS-like algorithms (e.g BFS, SSSP) which need to mark the active vertices. |
HAVE_EDGE_PROP | true/false | For some algorithms (e.g. SpMV, SSSP) they need the property of edges to make the calculation with the property of vertices, by setting this parameter, ThunderGP will automatically add a data path for load and process the property of edges. |
HAVE_UNSIGNED_PROP | true/false | If you need a signed property, set it to false. |
HAVE_APPLY | true/false | This parameter decides whether the apply compute kernel is needed or not, because that some algorithms may not need apply stage (e.g. SpMV). |
CUSTOMIZE_APPLY | true/false | If existing abstraction on apply stage can not fit your requirements, you should set it to true and build your specific data-flow using L1 function. The details is available in the next section. |
HAVE_APPLY_OUTDEG | true/false | This parameter controls whether there is an additional data path for load out-degree. |
ThunderGP provides a mechanism to using the low level APIs (L1) to build customize data-flow. Our existing code in template can be a reference. The customize steps are shown below:
- Modify the apply_kernel.mk, change the variable
CUSTOMIZE_APPLY
to true - Write the accelerator code in vertex_apply.cpp, a HLS function
vertexApply
is needed for building the accelerator - Write the host code and verification code in host_vertex_apply.cpp host function
setApplyKernel
is needed for setting up the arguments of OpenCL kernel, and host functionpartitionApplyCModel
is needed for the verification of the results from FPGA accelerator.
We are planning to to provide a script that will automatic generate the interfaces code for customizing the apply stage in the future.