Skip to content
This repository was archived by the owner on May 23, 2024. It is now read-only.

parallel_reduce with native simd type => seg fault in OpenMP #34

Open
pkestene opened this issue Oct 27, 2021 · 0 comments
Open

parallel_reduce with native simd type => seg fault in OpenMP #34

pkestene opened this issue Oct 27, 2021 · 0 comments

Comments

@pkestene
Copy link

pkestene commented Oct 27, 2021

Hello,

I was just trying to test a parallel_reduce (sum) using one of the native simd type and found a seg fault that seems to be associated with a wrong memory alignment in the return value of HostThreadTeamData::pool_reduce_local()

To illustrate this, I've updated avx.hpp to provide operator += (used in the reduce join operation), and used a custom reducer provided below.

// custom reducer for simd type (here avx)
template <class T, class Space>
struct SimdReducer {
 public:

  using simd_t = simd::simd<float,simd::simd_abi::native>;
  //using simd_t = simd::simd<T,simd::simd_abi::pack<4>>;

  using simd_storage_t = simd_t::simd_storage_t;

  // Required
  using reducer = SimdReducer<T, Space>;
  using value_type = simd_t;
  using value_type_storage = simd_storage_t;
  using result_view_type = Kokkos::View<value_type, Space, Kokkos::MemoryUnmanaged>;

 private:
  result_view_type value;

 public:
  KOKKOS_INLINE_FUNCTION
  SimdReducer(value_type& value_) : value(&value_) {}

  // Required
  KOKKOS_INLINE_FUNCTION
  void join(value_type& dest, const value_type& src) const {
    dest += src;
  }

  KOKKOS_INLINE_FUNCTION
  void join(volatile value_type& dest, const volatile value_type& src) const {
    dest += src;
  }

  KOKKOS_INLINE_FUNCTION
  void init(value_type& val) const {
    printf("before init %p\n",&val);
    val = simd_t(0.0); // seg fault here
    printf("after init\n");
  }

  KOKKOS_INLINE_FUNCTION
  value_type& reference() const { return *value.data(); }

  KOKKOS_INLINE_FUNCTION
  result_view_type view() const { return value; }

  KOKKOS_INLINE_FUNCTION
  bool references_scalar() const { return true; }
};
  • a parallel_reduce with this reducer works fine if device is Serial, but gives me a segmentation fault when I use device OpenMP (whatever the number of threads)
  • If I change simd type to be e.g. simd_abi::pack<4>, the crash disappears, and it works fine.
  • here when compiling for avx, simd<float,simd::simd_abi::native> is 32 bytes, but when I print in reducer init the address of the reference value coming from the call to pool_reduce_local() (in HostThreadTeamData), the address is 16 bytes aligned, but I think it should be 32 bytes aligned. I think this explains the seg fault.

I may be wrong but I think it is necessary to control alignment inside HostThreadTeamData so that the returned pointer is properly align.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant