Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OMPD 5 and limited OMPD support for cuda devices #49

Open
wants to merge 69 commits into
base: ompd
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
d8edcc7
Merge branch 'master' into ompd
Jun 21, 2018
c836b8d
[OMPD] Fixing runtime ompd type references
manorom Jun 22, 2018
73fe9de
[OMPD] add cmake modules necessary to build gdb-wrapper
manorom Jun 22, 2018
255356b
[OMPD] align libompd and odb w/ new OpenMP spec
manorom Jun 25, 2018
0c3427e
Add ompd_get_thread_handle for cuda.
manorom Jun 29, 2018
63309d5
[OMPD] Adds some support for omp states on cuda
manorom Jul 2, 2018
e1949d3
[OMPD] Further align code to spec
manorom Jul 2, 2018
f9cc9b4
[OMPD] Aligned callback names with spec
manorom Jul 3, 2018
cffa430
Fix formatting mistake
manorom Jul 3, 2018
e72f4a7
[OMPD] Add parallel and reduction state to npvtx
manorom Jul 3, 2018
27e8cc8
[OMPD] Add reduction state + save thread coords
manorom Jul 5, 2018
322f3f6
[OMPD] cleanup and + ompd parallel handle for cuda
manorom Jul 10, 2018
8a25e59
[OMPD] Remove unsupported api functions + fixes
manorom Jul 13, 2018
485b132
[OMPD] add support for some ICVs to libompd
manorom Jul 13, 2018
f37acc3
[OMPD] Add serial state in cuda device rtl
manorom Jul 13, 2018
4c40f17
[OMPD] Add some icvs + move values in ompd_types.h
manorom Jul 13, 2018
528b3ed
[OMPD] Align ompd.h formatting to spec
manorom Jul 16, 2018
c612600
[OMPD] changed some comments for current spec
manorom Jul 16, 2018
5dd42cc
[OMPD] Some cleanup in gdb-wrapper
manorom Jul 17, 2018
3473deb
[OMPD] fix omp version
manorom Jul 17, 2018
64b73b0
[OMPD] Add type compatibillity for new spec to odb
manorom Jul 17, 2018
43d1e58
[OMPD] Code clean-up in odb
manorom Jul 17, 2018
66775c1
[OMPD] Fix bug introd. by type changes f. new spec
manorom Jul 17, 2018
bf0ea8a
[OMPD] Add "tasks" command to odb
manorom Jul 18, 2018
ee03881
Merge remote-tracking branch 'origin/ompd' into ompd-devices
manorom Jul 18, 2018
504e7be
[OMPD] Add get_task_function and test code in odb
manorom Jul 19, 2018
d208dc1
[OMPD] Fix some formatting
manorom Jul 20, 2018
94c3081
[OMPD] Fix formatting mistakes
manorom Jul 20, 2018
267a230
[OMPD] Fix linking for gdb-wrapper
manorom Jul 23, 2018
14b0cea
[OMPD] Save cuda kernel info in ompd handles
manorom Jul 23, 2018
33ccbfa
Merge remote-tracking branch 'origin/master' into ompd-devices
manorom Jul 23, 2018
7a79281
[OMPD] Set correct omp states for all modes (cuda)
manorom Jul 30, 2018
5bec325
[OMPD] Fix thread handles in all exec modes (cuda)
manorom Jul 30, 2018
b3de3ae
[OMPD] Add ompd break points for thread begin/end
manorom Aug 1, 2018
b96b31c
[OMPD] Add some support: parallel handles (cuda)
manorom Aug 7, 2018
3609e86
[OMPD] remove usage of kernel id in odb
manorom Aug 8, 2018
a619134
[OMPD] Add ompd_get_thread_in_parallel for cuda
manorom Aug 8, 2018
011a376
[OMPD] Move memory segment values to ompd_types.h
manorom Aug 8, 2018
8d49887
[OMPD] Re-enable checking for ompd state tracking
manorom Aug 8, 2018
300c218
Remove accidentally committed .bak file
manorom Aug 9, 2018
e0538aa
Merge remote-tracking branch 'origin/ompd' into ompd-devices
manorom Aug 10, 2018
c806b3e
Merge remote-tracking branch 'origin/ompd' into ompd-devices
manorom Aug 10, 2018
783262c
[OMPD] Make odb initialization at first command
manorom Aug 13, 2018
17f53de
[OMPD] Add code to support some ICVs on cuda devs
manorom Aug 13, 2018
598ec68
[OMPD] removed ompdAllocatable class
manorom Aug 14, 2018
21d2ad7
[OMPD] Remove _ompd_device_handle_s struct
manorom Aug 14, 2018
6351c6f
[OMPD] Work around for cuda-gdb name mangling bug
manorom Aug 23, 2018
4b6f24b
[OMPD] Add support for cuda icvs to odb
manorom Aug 23, 2018
0322eab
[OMPD] Fix enclosing_parallel + enable target ICVs
manorom Aug 27, 2018
098dd55
[OMPD] Fix ompd_get_thread_in_parallel + add test
manorom Aug 30, 2018
66b0339
[OMPD] Update ompd_types.h + remove kernelId field
manorom Sep 4, 2018
09ed60b
[OMPD] Fix ompd_types.h
manorom Sep 5, 2018
b2be2d6
[OMPD] Add some support for tasks for cuda
manorom Sep 7, 2018
24e9595
[OMPD] Fix parallel level for explicit tasks
manorom Sep 17, 2018
d15db76
[OMPD] omp_device_t -> ompd_device_t
manorom Sep 18, 2018
58e5df5
[OMPD] Add examples to test ompd on cuda
manorom Sep 21, 2018
42982e4
[OMPD] Add ompd breakpoints
manorom Sep 21, 2018
23f5ea0
[OMPD] Fix task/parallel handle interaction
manorom Sep 21, 2018
8c4c7ae
[OMPD] Make OMPD configurable for deviceRTLs
manorom Sep 25, 2018
901ff1b
[OMPD] Clean up by removing unnecesarry header
manorom Sep 25, 2018
6a825eb
[OMPD} Remove deprecated ompd_process_handle_t
manorom Sep 25, 2018
bc9c1e3
[OMPD] Remove TODOs
manorom Sep 27, 2018
d861bcd
[OMPD] Add some kernel info + removed TODO notes
manorom Sep 28, 2018
c1b4c5a
Add information on scheduling parent for the master task in a paralle…
jprotze Oct 12, 2018
66b0f8d
Initialize all address variables
loeseke Oct 17, 2018
726b21b
Fix ompd_get_thread_id
jprotze Dec 14, 2018
36fcaf3
Initialize lwt field for ompd_get_task_in_parallel
loeseke Jan 16, 2019
e1c6cad
Fix ompd_get_task_function for implicit tasks
jprotze Jan 17, 2019
f9bb739
Use the right team to get the function pointer
loeseke Jan 17, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ if (OPENMP_STANDALONE_BUILD OR "${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_S
"Enable -Werror flags to turn warnings into errors for supporting compilers.")
set(OPENMP_LIBDIR_SUFFIX "" CACHE STRING
"Suffix of lib installation directory, e.g. 64 => lib64")
# Do not use OPENMP_LIBDIR_SUFFIX directly, use OPENMP_INSTALL_LIBDIR.
set(OPENMP_INSTALL_LIBDIR "lib${OPENMP_LIBDIR_SUFFIX}")

# Group test settings.
set(OPENMP_TEST_C_COMPILER ${CMAKE_C_COMPILER} CACHE STRING
Expand All @@ -28,7 +30,7 @@ if (OPENMP_STANDALONE_BUILD OR "${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_S
else()
set(OPENMP_ENABLE_WERROR ${LLVM_ENABLE_WERROR})
# If building in tree, we honor the same install suffix LLVM uses.
set(OPENMP_LIBDIR_SUFFIX ${LLVM_LIBDIR_SUFFIX})
set(OPENMP_INSTALL_LIBDIR "lib${LLVM_LIBDIR_SUFFIX}")

if (NOT MSVC)
set(OPENMP_TEST_C_COMPILER ${LLVM_RUNTIME_OUTPUT_INTDIR}/clang)
Expand Down
6 changes: 4 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -257,9 +257,11 @@ Options for ``libomptarget``
Options for ``NVPTX device RTL``
--------------------------------

**LIBOMPTARGET_NVPTX_ENABLE_BCLIB** = ``OFF|ON``
**LIBOMPTARGET_NVPTX_ENABLE_BCLIB** = ``ON|OFF``
Enable CUDA LLVM bitcode offloading device RTL. This is used for link time
optimization of the OMP runtime and application code.
optimization of the OMP runtime and application code. This option is enabled
by default if the build system determines that `CMAKE_C_COMPILER` is able to
compile and link the library.

**LIBOMPTARGET_NVPTX_CUDA_COMPILER** = ``""``
Location of a CUDA compiler capable of emitting LLVM bitcode. Currently only
Expand Down
33 changes: 33 additions & 0 deletions libompd/cuda_examples/test_target_generic.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// Testing generic mode of nvptx devRtl
#include <stdio.h>

#pragma omp declare target
void test_breakpoint() {
asm("");
}
#pragma omp end declare target

void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
//init(v1, v2, N);
#pragma omp target map(v1, v2, p)
{
test_breakpoint();
#pragma omp parallel for
for (i=0; i<N; i++)
{
test_breakpoint();
p[i] = v1[i] * v2[i];
}
test_breakpoint();
}
//output(p, N);
}
int main() {
printf("calling vec_mul...\n");
vec_mult(64);
printf("done\n");
return 0;
}
43 changes: 43 additions & 0 deletions libompd/cuda_examples/test_target_multilevel.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#include <stdio.h>
#include <omp.h>

#pragma omp declare target
void test_breakpoint() {
asm("");
}
#pragma omp end declare target

void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
omp_set_nested(1);
#pragma omp target map(v1, v2, p)
{
omp_set_nested(1);
#pragma omp parallel shared(v1, v2, p, N) num_threads(4)
{
printf("Outer region - thread ID: %d\n", omp_get_thread_num());
#pragma omp for
for (int i = 0; i < N; ++i)
{
float acc = 0;
#pragma omp parallel shared(v1, v2, p, N) num_threads(4)
#pragma omp for
for(int j = 0; j < N; ++j)
{
test_breakpoint();
p[i] += v1[i] + v2[i];
}
}
}
printf("End of target region\n");
}
//output(p, N);
}
int main() {
printf("calling vec_mul...\n");
vec_mult(64);
printf("done\n");
return 0;
}
24 changes: 24 additions & 0 deletions libompd/cuda_examples/test_target_noparallel.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#include <stdio.h>

#pragma omp declare target
void test_breakpoint() {
asm("");
}
#pragma omp end declare target

void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
#pragma omp target map(v1, v2, p)
{
test_breakpoint();
p[0] = v[0] * v[0];
}
}
int main() {
printf("calling vec_mul...\n");
vec_mult(64);
printf("done\n");
return 0;
}
29 changes: 29 additions & 0 deletions libompd/cuda_examples/test_target_single.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#include <stdio.h>

#pragma omp declare target
float mult(float u, float v) {
return u * v;
}
#pragma omp end declare target

void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
//init(v1, v2, N);
#pragma omp target map(v1, v2, p)
{
#pragma omp parallel for
for (i=0; i<N; i++)
{
p[i] = mult(v1[i], v2[i]);
}
}
//output(p, N);
}
int main() {
printf("calling vec_mul...\n");
vec_mult(64);
printf("done\n");
return 0;
}
31 changes: 31 additions & 0 deletions libompd/cuda_examples/test_target_spmd.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Testing spmd mode
#include <stdio.h>

#pragma omp declare target
void test_breakpoint() {
asm("");
}
#pragma omp end declare target

void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
//init(v1, v2, N);
#pragma omp target map(v1, v2, p)
{
#pragma omp parallel for
for (i=0; i<N; i++)
{
test_breakpoint();
p[i] = v1[i] * v2[i];
}
}
//output(p, N);
}
int main() {
printf("calling vec_mul...\n");
vec_mult(2048);
printf("done\n");
return 0;
}
33 changes: 33 additions & 0 deletions libompd/cuda_examples/test_target_task.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#include <stdio.h>
#include <stdint.h>

#pragma omp declare target
void task1() {
printf("Hello from Task 1\n");
uint32_t enter_frame = 0;
for(;1;) {
}
}
void task2() {
printf("Hello from Task 2\n");
for(;1;) {
}
}
#pragma omp end declare target

int main() {
#pragma omp target
{
#pragma omp parallel num_threads(4)
{
#pragma omp single
{
#pragma omp task
task1();
#pragma omp task
task2();
}
}
}
return 0;
}
12 changes: 7 additions & 5 deletions libompd/gdb-wrapper/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
project (odb)

cmake_minimum_required(VERSION 2.8)

set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/Modules/")

set (cppfiles
InputOutputManager.cpp
ChildProcess.cpp
Expand Down Expand Up @@ -30,14 +34,12 @@ add_executable (odb-bin ${cppfiles} odb.cpp)
set_target_properties (odb-bin PROPERTIES OUTPUT_NAME odb)
add_library (odb ${cppfiles})

if (ODB_LINUX)
target_link_libraries (odb-bin dl)
target_link_libraries (odb dl)
endif (ODB_LINUX)
target_link_libraries (odb-bin dl)
target_link_libraries (odb dl)

include_directories (
${CMAKE_CURRENT_SOURCE_DIR}
# ${CMAKE_CURRENT_SOURCE_DIR}/../src/
${CMAKE_CURRENT_SOURCE_DIR}/../src/
${CMAKE_BINARY_DIR}/include
)

Expand Down
40 changes: 22 additions & 18 deletions libompd/gdb-wrapper/Callbacks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,16 @@ void initializeCallbacks(const GdbProcessPtr &proc)
gdb = proc;

// Initialize static table
cb.dmemory_alloc = CB_dmemory_alloc;
cb.dmemory_free = CB_dmemory_free;
cb.print_string = CB_print_string;
cb.get_thread_context_for_osthread = CB_thread_context;
cb.get_containing_process_context = CB_process_context;
cb.tsizeof_prim = CB_tsizeof_prim;
cb.tsymbol_addr = CB_tsymbol_addr;
cb.read_tmemory = CB_read_tmemory;
cb.write_tmemory = CB_write_tmemory;
cb.host_to_target = CB_host_to_target;
cb.target_to_host = CB_target_to_host;
cb.memory_alloc = CB_dmemory_alloc;
cb.memory_free = CB_dmemory_free;
cb.print_string = CB_print_string;
cb.get_thread_context_for_thread_id = CB_thread_context;
cb.sizeof_types = CB_tsizeof_prim;
cb.symbol_addr_lookup = CB_tsymbol_addr;
cb.read_memory = CB_read_tmemory;
cb.write_memory = CB_write_tmemory;
cb.host_to_device = CB_host_to_target;
cb.device_to_host = CB_target_to_host;
}

ompd_callbacks_t * getCallbacksTable()
Expand Down Expand Up @@ -78,14 +77,14 @@ ompd_rc_t CB_dmemory_free (

ompd_rc_t CB_thread_context (
ompd_address_space_context_t *context,
ompd_osthread_kind_t kind,
ompd_thread_id_t kind,
ompd_size_t sizeof_osthread,
const void* osthread,
ompd_thread_context_t **tcontext
)
{
ompd_rc_t ret = context ? ompd_rc_ok : ompd_rc_stale_handle;
if (kind == ompd_osthread_cudalogical) {
if (kind == OMPD_THREAD_ID_CUDALOGICAL) {
*tcontext = ((OMPDContext*)context)->getContextForThread((CudaThread*)osthread);
}
else {
Expand Down Expand Up @@ -126,7 +125,7 @@ void init_sizes(){

ompd_rc_t CB_tsizeof_prim(
ompd_address_space_context_t *context,
ompd_target_type_sizes_t *sizes)
ompd_device_type_sizes_t *sizes)
{
ompd_rc_t ret = context ? ompd_rc_ok : ompd_rc_stale_handle;
static int inited = 0;
Expand All @@ -135,7 +134,12 @@ ompd_rc_t CB_tsizeof_prim(
inited=1;
init_sizes();
}
memcpy(sizes, prim_sizes, sizeof(prim_sizes[0])*ompd_type_max);
sizes->sizeof_char = prim_sizes[ompd_type_char];
sizes->sizeof_short = prim_sizes[ompd_type_short];
sizes->sizeof_int = prim_sizes[ompd_type_int];
sizes->sizeof_long = prim_sizes[ompd_type_long];
sizes->sizeof_long_long = prim_sizes[ompd_type_long_long];
sizes->sizeof_pointer = prim_sizes[ompd_type_pointer];

return ret;
}
Expand Down Expand Up @@ -175,7 +179,7 @@ ompd_rc_t CB_tsymbol_addr(
parser.matchAddressValue(gdb->readOutput().c_str(), addr);

if (strlen(addr) > 0)
symbol_addr->address = (ompd_taddr_t) strtoull (addr, NULL, 0);
symbol_addr->address = (ompd_addr_t) strtoull (addr, NULL, 0);
else if (strlen(addr) == 0)
ret = ompd_rc_error;

Expand Down Expand Up @@ -267,7 +271,7 @@ ompd_rc_t CB_write_tmemory (
ompd_address_space_context_t *context,
ompd_thread_context_t *tcontext,
ompd_address_t addr,
ompd_tword_t nbytes,
ompd_word_t nbytes,
const void *buffer)
{
return ompd_rc_unsupported;
Expand All @@ -277,7 +281,7 @@ ompd_rc_t CB_read_tmemory (
ompd_address_space_context_t *context,
ompd_thread_context_t *tcontext,
ompd_address_t addr,
ompd_tword_t nbytes,
ompd_word_t nbytes,
void *buffer)
{
if (!context)
Expand Down
8 changes: 4 additions & 4 deletions libompd/gdb-wrapper/Callbacks.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ ompd_rc_t CB_dmemory_free (

ompd_rc_t CB_thread_context (
ompd_address_space_context_t *context,
ompd_osthread_kind_t kind,
ompd_thread_id_t kind,
ompd_size_t sizeof_osthread,
const void* osthread,
ompd_thread_context_t **tcontext);
Expand All @@ -59,7 +59,7 @@ ompd_rc_t CB_process_context (

ompd_rc_t CB_tsizeof_prim (
ompd_address_space_context_t *context,
ompd_target_type_sizes_t *sizes);
ompd_device_type_sizes_t *sizes);

ompd_rc_t CB_tsymbol_addr (
ompd_address_space_context_t *context,
Expand All @@ -71,15 +71,15 @@ ompd_rc_t CB_read_tmemory (
ompd_address_space_context_t *context,
ompd_thread_context_t *tcontext,
const ompd_address_t addr,
ompd_tword_t nbytes,
ompd_word_t nbytes,
void *buffer
);

ompd_rc_t CB_write_tmemory (
ompd_address_space_context_t *context,
ompd_thread_context_t *tcontext,
const ompd_address_t addr,
ompd_tword_t nbytes,
ompd_word_t nbytes,
const void *buffer
);

Expand Down
1 change: 1 addition & 0 deletions libompd/gdb-wrapper/CudaGdb.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include <vector>
#include <map>
#include "ompd.h"
#include "../src/ompd-private.h"

struct CudaThread {
ompd_cudathread_coord_t coord;
Expand Down
Loading