Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpetra: Hang in getLocalDiagCopy post Kokkos-4.4 #13498

Open
vbrunini opened this issue Oct 2, 2024 · 3 comments · May be fixed by #13575
Open

tpetra: Hang in getLocalDiagCopy post Kokkos-4.4 #13498

vbrunini opened this issue Oct 2, 2024 · 3 comments · May be fixed by #13575
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@vbrunini
Copy link
Contributor

vbrunini commented Oct 2, 2024

Bug Report

@trilinos/tpetra

Description

Calling CrsMatrix::getLocalDiagCopy(Vector) on a non-fillComplete matrix hangs in a Cuda build after the Kokkos-4.4 thread safety changes. Stack trace is:

#0  0x000015551288e85d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000155512887ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x000000000f21e40b in void Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::Serial, Kokkos::Serial::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::Tools::Experimental::SpecialSynchronizationCases, Kokkos::Serial::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1} const&) ()
#3  0x000000000f21e536 in Kokkos::Impl::ExecSpaceDerived<Kokkos::Serial>::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#4  0x000000000f1f18c5 in Kokkos::Impl::ExecSpaceManager::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#5  0x000000000e14302c in void Kokkos::deep_copy<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u>, unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >(Kokkos::View<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> > const&, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> > const&, std::enable_if<(std::is_void<Kokkos::ViewTraits<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >::specialize>::value&&std::is_void<Kokkos::ViewTraits<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >::specialize>::value)&&((((unsigned int)Kokkos::ViewTraits<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >::rank)!=(0))||(((unsigned int)Kokkos::ViewTraits<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >::rank)!=(0))), void>::type*) [clone .isra.0] ()
#6  0x000000000e18c47e in Tpetra::CrsGraph<int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >::getRowPtrsUnpackedHost() const ()
#7  0x000000000e7abf98 in Tpetra::CrsGraph<int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >::getRowInfo(int) const ()
#8  0x000000000e2afb34 in Tpetra::CrsMatrix<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >::getLocalRowView(int, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >&, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >&) const ()
#9  0x000000000e5f66e0 in Kokkos::Impl::ParallelReduceAdaptor<Kokkos::RangePolicy<Kokkos::Serial, int>, Tpetra::Details::GetLocalDiagCopyWithoutOffsetsNotFillCompleteFunctor<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >, int>::execute_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial, int> const&, Tpetra::Details::GetLocalDiagCopyWithoutOffsetsNotFillCompleteFunctor<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> > const&, int&) ()
#10 0x000000000e5f8bf1 in Tpetra::Details::GetLocalDiagCopyWithoutOffsetsNotFillCompleteFunctor<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >::GetLocalDiagCopyWithoutOffsetsNotFillCompleteFunctor(int&, Tpetra::Vector<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >&, Tpetra::RowMatrix<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> > const&) ()
#11 0x000000000e5f99f0 in int Tpetra::Details::getLocalDiagCopyWithoutOffsetsNotFillComplete<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >(Tpetra::Vector<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >&, Tpetra::RowMatrix<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> > const&, bool) ()
#12 0x000000000e2f462d in Tpetra::CrsMatrix<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >::getLocalDiagCopy(Tpetra::Vector<double, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaSpace> >&) const ()

It looks like the GetLocalDiagCopyWithoutOffsetsNotFillCompleteFunctor ends up calling CrsGraph::getRowPtrsUnpackedHost inside a parallel_reduce, but that function does a View allocation in some cases which is not allowed in a parallel region.

@vbrunini vbrunini added the type: bug The primary issue is a bug in Trilinos code or tests label Oct 2, 2024
@lucbv
Copy link
Contributor

lucbv commented Oct 31, 2024

@vbrunini is this happening in Sierra itself or rather in some unit-test?
Looking at Tpetra unit-tests I am not sure which one would call this on a non-fillComplete matrix?

@vbrunini
Copy link
Contributor Author

This was in SPARC code, not a Trilinos unit test.

lucbv added a commit to lucbv/Trilinos that referenced this issue Nov 5, 2024
Adding a new unit-test that covers code path for fillComplete and
fillActive states. The code paths that handled both these cases is
now merged and follows the fillComplete case as a staticCrsGraph is
assumed when calling getLocalDiagCopy.

Signed-off-by: Luc Berger-Vergiat <[email protected]>
@lucbv
Copy link
Contributor

lucbv commented Nov 5, 2024

@vbrunini can you try the changes from PR #13575 I am not sure how SPARC calls this function so we might need to modify the new unit-test to cover the application use case a bit better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants