Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to configure hypre with sycl for Aurora? #1222

Open
jczhang07 opened this issue Jan 30, 2025 · 15 comments
Open

How to configure hypre with sycl for Aurora? #1222

jczhang07 opened this issue Jan 30, 2025 · 15 comments

Comments

@jczhang07
Copy link

Hello,

From https://hypre.readthedocs.io/en/latest/ch-misc.html#gpu-build-options, it seems I don't need --with-gpu-arch=ARG anymore, and only need --with-sycl, --with-sycl-target=ARG, and --with-sycl-target-backend=ARG. But I don't know what ARGs I should use. Do you have examples for Aurora?

BTW, I tried --with-sycl-target=spir64_gen --with-sycl-target-backend=12.60.7, but met warnings & errors:

make[1]: Entering directory '/home/jczhang/petsc/arch-kokkos-dbg/externalpackages/git.hypre/src/parcsr_mv'
icpx  -fPIC -O3 -fsycl -fsycl-unnamed-lambda  -DHAVE_CONFIG_H -I.. -I. -I./.. -I./../blas -I./../lapack -I./../utilities -I./../seq_mv -I./../seq_block_mv    -qmkl -I/opt/aurora/24.180.3/oneapi/dpl/latest/include -DMKL_ILP64 -I/opt/aurora/24.180.3/updates/oneapi/mkl/develop_20240710/include       -c par_csr_filter_device.c -o par_csr_filter_device.obj
icpx: warning: treating 'c' input as 'c++' when -fsycl is used [-Wexpected-file-type]
In file included from par_csr_filter_device.c:9:
./../utilities/_hypre_utilities.hpp:1771:25: warning: 'barrier' is deprecated: Sub-group barrier with no arguments is deprecated.Use sycl::group_barrier with the sub-group as the argument instead. [-Wdeprecated-declarations]
 1771 |    item.get_sub_group().barrier();
      |                         ^
/opt/aurora/24.180.3/updates/oneapi/compiler/eng-20240629/bin/compiler/../../include/sycl/sub_group.hpp:623:3: note: 'barrier' has been explicitly marked deprecated here
  623 |   __SYCL_DEPRECATED(
      |   ^
/opt/aurora/24.180.3/updates/oneapi/compiler/eng-20240629/bin/compiler/../../include/sycl/detail/defines_elementary.hpp:44:38: note: expanded from macro '__SYCL_DEPRECATED'
   44 | #define __SYCL_DEPRECATED(message) [[deprecated(message)]]
      |                                      ^
In file included from par_csr_filter_device.c:9:
./../utilities/_hypre_utilities.hpp:1827:25: warning: 'barrier' is deprecated: Sub-group barrier with no arguments is deprecated.Use sycl::group_barrier with the sub-group as the argument instead. [-Wdeprecated-declarations]
 1827 |    item.get_sub_group().barrier();
      |                         ^
/opt/aurora/24.180.3/updates/oneapi/compiler/eng-20240629/bin/compiler/../../include/sycl/sub_group.hpp:623:3: note: 'barrier' has been explicitly marked deprecated here
  623 |   __SYCL_DEPRECATED(
      |   ^
/opt/aurora/24.180.3/updates/oneapi/compiler/eng-20240629/bin/compiler/../../include/sycl/detail/defines_elementary.hpp:44:38: note: expanded from macro '__SYCL_DEPRECATED'
   44 | #define __SYCL_DEPRECATED(message) [[deprecated(message)]]
      |                                      ^
par_csr_filter_device.c:141:33: error: use of undeclared identifier 'hypre_ballot_sync'
  141 |       hypre_mask      ballot  = hypre_ballot_sync(HYPRE_WARP_FULL_MASK, write);
      |                                 ^

Thanks!
--Junchao

@victorapm
Copy link
Contributor

@waynemitchell is the expert on Aurora and can probably help here. In the meantime, take a look at our test script for Aurora: https://github.com/hypre-space/hypre/blob/master/AUTOTEST/machine-aurora.sh

oneAPI is changing frequently, and you might need to use a newer version: module load oneapi/eng-compiler/2024.07.30.002

@jczhang07
Copy link
Author

Yes, I did have oneapi/eng-compiler/2024.07.30.002. From the AUTOTEST script, I re-tried with --with-sycl --enable-unified-memory only, but the errors still existed.

@victorapm
Copy link
Contributor

Are you using v2.32.0? https://gitlab.com/petsc/petsc/-/blame/main/config/BuildSystem/config/packages/hypre.py#L7

Wayne implemented fixes for the SYCL build since 2.32.0. I would recommend using hypre's master

@jczhang07
Copy link
Author

With hypre/master, that error went away. But there were new errors with respect to MPI symbols. I think I know why.

./configure --prefix=/home/jczhang/petsc/arch-kokkos-dbg MAKE=/opt/aurora/24.180.3/spack/unified/0.8.0/install/linux-sles15-x86_64/gcc-12.2.0/gmake-4.4.1-6g37exp/bin/gmake --libdir=/home/jczhang/petsc/arch-kokkos-dbg/lib CC="mpicc" CFLAGS="-fPIC -Wno-sign-conversion -Wno-float-conversion -Wno-implicit-float-conversion -Wno-cast-function-type-mismatch -Qunused-arguments -O2 -g" CXX="mpicxx" CXXFLAGS="-O2 -g -std=gnu++17 -fPIC" --disable-fortran --disable-fc --disable-f77 --disable-f90 --enable-shared --disable-fortran --with-MPI-lib-dirs="" --with-MPI-libs="" --with-blas-lib="-Wl,-rpath,/opt/aurora/24.180.3/updates/oneapi/mkl/develop_20240710/lib/intel64 -L/opt/aurora/24.180.3/updates/oneapi/mkl/develop_20240710/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -Wl,-rpath,/opt/aurora/24.180.3/updates/oneapi/compiler/eng-20240629/lib -L/opt/aurora/24.180.3/updates/oneapi/compiler/eng-20240629/lib -lsycl" --with-lapack-lib=" " --with-blas=no --with-lapack=no --with-sycl CUCC="" CUFLAGS="" --with-fmangle-blas=no-underscores --with-fmangle-lapack=no-underscores --without-mli --without-superlu AR="/usr/bin/ar cr" LDFLAGS=""

...
/usr/bin/ld: /tmp/icpx-baebde00d6/mpistubs-f2dc67.o: in function `hypre_MPI_Init':
/home/jczhang/petsc/arch-kokkos-dbg/externalpackages/git.hypre/src/utilities/mpistubs.c:925: undefined reference to `MPI_Init'

Because hypre uses icpx to link, instead of CXX, which is mpicxx. I think hypre should use CC or CXX instead.

@victorapm
Copy link
Contributor

Could you share src/config/Makefile.config and src/config.log?

@jczhang07
Copy link
Author

Attached.

config.log

Makefile.config.txt

@victorapm
Copy link
Contributor

Could you try removing CUCC="" from your configure line?

@jczhang07
Copy link
Author

Removed but no effect

@victorapm
Copy link
Contributor

Ok! Could you specify --with-MPI-include and --with-MPI-lib-dirs? If that does not work, you can also try CUCC=mpiicpx

@jczhang07
Copy link
Author

Yes, I could add --with-MPI-include and --with-MPI-lib-dirs, but that is not a good design for hypre, since users already gave cc=mpicc and CC=mpicxx, which are supposed to take care of MPI libraries. @balay

@victorapm
Copy link
Contributor

victorapm commented Jan 31, 2025

Thanks for testing this, Junchao. I see your point. For now, as a workaround, specifying --with-MPI-include and --with-MPI-lib-dirs should help. Another option is to explicitly set CUCC=mpiicpx to force the correct MPI compiler wrapper when building with SYCL.

I’ll bring this up with the team to see if we can improve this in future releases. Let me know if the workaround helps!

Note that using CMake makes things much easier since it automatically detects and configures the correct MPI settings, avoiding these manual adjustments. I encourage PETSc to consider building hypre via CMake

@jczhang07
Copy link
Author

jczhang07 commented Jan 31, 2025

Hi, Victor,
Using CUCC=mpicc or CUCC=mpicxx fixed the problem and I was able to build hypre. But when I ran a petsc/hypre test, I met this runtime error:

terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): No kernel named _ZTSN4sycl3_V16detail18RoundedRangeKernelINS0_4itemILi1ELb1EEELi1EZZNK6oneapi3dpl20__par_backend_hetero24__parallel_for_submitterINS7_10__internal22__optional_kernel_nameIJEEEEclINS6_9execution5__dpl13device_policyINSF_17DefaultKernelNameEEENS6_13unseq_backend6walk_nISI_NS6_10__internal19__transform_functorIZ18hypreSycl_sequenceIPiiEvT_SP_T0_EUlSP_E_EEEElJNS6_8__ranges10guard_viewINS6_17counting_iteratorIlEEEENSV_ISO_EEEEEDaOSP_SQ_T1_DpOT2_ENKUlRNS0_7handlerEE_clES16_EUlS4_E_EE was found -46 (PI_ERROR_INVALID_KERNEL_NAME)

`
$ c++filt ZTSZZNK6oneapi3dpl20__par_backend_hetero24__parallel_for_submitterINS1_10__internal22__optional_kernel_nameIJEEEEclINS0_9execution5__dpl13device_policyINS9_17DefaultKernelNameEEENS0_13unseq_backend6walk_nISC_NS0_10__internal19__transform_functorIZ18hypreSycl_sequenceIPiiEvT_SJ_T0_EUlSJ_E_EEEElJNS0_8__ranges10guard_viewINS0_17counting_iteratorIlEEEENSP_ISI_EEEEEDaOSJ_SK_T1_DpOT2_ENKUlRN4sycl3_V17handlerEE_clES12_EUlNS10_4itemILi1ELb1EEEE

typeinfo name for oneapi::dpl::__par_backend_hetero::__parallel_for_submitter<oneapi::dpl::__par_backend_hetero::__internal::__optional_kernel_name<> >::operator()<oneapi::dpl::execution::__dpl::device_policyoneapi::dpl::execution::__dpl::DefaultKernelName, oneapi::dpl::unseq_backend::walk_n<oneapi::dpl::execution::__dpl::device_policyoneapi::dpl::execution::__dpl::DefaultKernelName, oneapi::dpl::__internal::__transform_functor<hypreSycl_sequence<int*, int>(int*, int*, int)::{lambda(auto:1)#1}> >, long, oneapi::dpl::__ranges::guard_view<oneapi::dpl::counting_iterator >, oneapi::dpl::__ranges::guard_view<int*> >(oneapi::dpl::execution::__dpl::device_policyoneapi::dpl::execution::__dpl::DefaultKernelName&&, oneapi::dpl::unseq_backend::walk_n<oneapi::dpl::execution::__dpl::device_policyoneapi::dpl::execution::__dpl::DefaultKernelName, oneapi::dpl::__internal::__transform_functor<hypreSycl_sequence<int*, int>(int*, int*, int)::{lambda(auto:1)#1}> >, long, oneapi::dpl::__ranges::guard_view<oneapi::dpl::counting_iterator >&&, oneapi::dpl::__ranges::guard_view<int*>&&) const::{lambda(sycl::_V1::handler&)#1}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::item<1, true>)#1}
`

@victorapm
Copy link
Contributor

Thanks Junchao, I don't have access to Aurora, so I can't help with this. Hopefully Wayne can chime in and help you with this issue

@waynemitchell
Copy link
Contributor

Hi @jczhang07 . Sorry for my slow response on this. I was out sick for most of the past couple weeks. What's the current status on your end? I don't know that I totally follow the entire discussion above... For me, simply doing ./configure --with-sycl && make is sufficient to configure and compile hypre on Aurora with the default environment on the machine.

@jczhang07
Copy link
Author

@waynemitchell I could configure and build hypre on Aurora (via petsc). But when I ran a petsc/hypre test, I met the "No kernel named ..." runtime error. See above posts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants