Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] bindless_images tests failed after f9c8c01d38f8fb #16503

Open
jsji opened this issue Jan 2, 2025 · 5 comments
Open

[CUDA] bindless_images tests failed after f9c8c01d38f8fb #16503

jsji opened this issue Jan 2, 2025 · 5 comments
Assignees
Labels
bug Something isn't working cuda CUDA back-end pdtracker Pulldown tracker for issues/reverts that needs follow up sycl-bindless-images SYCL Bindless Images

Comments

@jsji
Copy link
Contributor

jsji commented Jan 2, 2025

Describe the bug

With the new optimization in llvm/llvm-project#119730

[NVPTX] Aggressively try to replace image handles with references

The following test failed for CUDA.

Failed Tests (7):
  SYCL :: bindless_images/examples/example_1_1D_read_write.cpp
  SYCL :: bindless_images/examples/example_2_2D_dynamic_read.cpp
  SYCL :: bindless_images/examples/example_4_1D_array_read_write.cpp
  SYCL :: bindless_images/read_2D_dynamic.cpp
  SYCL :: bindless_images/read_write_1D.cpp
  SYCL :: bindless_images/read_write_1D_subregion.cpp
  SYCL :: bindless_images/user_types/read_write_user_type.cpp

f9c8c01 is reverted temporaily to unblock the pulldown, please investigate/fix and reland.

See details in
https://github.com/intel/llvm/actions/runs/12563446905/job/35025604202

FAIL: SYCL :: bindless_images/examples/example_1_1D_read_write.cpp (2117 of 2239)
******************** TEST 'SYCL :: bindless_images/examples/example_1_1D_read_write.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 3
/__w/llvm/llvm/toolchain/bin//clang++  -Werror  -fsycl -fsycl-targets=nvptx64-nvidia-cuda  /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/examples/example_1_1D_read_write.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/examples/Output/example_1_1D_read_write.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/examples/example_1_1D_read_write.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/examples/Output/example_1_1D_read_write.cpp.tmp.out
# .---command stderr------------
# | Unknown instruction operating on handle
# | UNREACHABLE executed at /__w/llvm/llvm/src/llvm/lib/Target/NVPTX/NVPTXReplaceImageHandles.cpp:1854!
# | PLEASE submit a bug report to https://github.com/intel/llvm/issues and include the crash backtrace, preprocessed source, and associated run script.
# | Stack dump:
# | 0.	Program arguments: /__w/llvm/llvm/toolchain/bin/clang-20 -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -fenable-sycl-dae -Wno-sycl-strict -O2 -fsycl-int-header=/tmp/lit-tmp-2a8clmef/example_1_1D_read_write-header-d831a9.h -fsycl-int-footer=/tmp/lit-tmp-2a8clmef/example_1_1D_read_write-footer-ef7000.h -D__SYCL_TARGET_NVIDIA_GPU_SM_50__ -sycl-std=2020 -fsycl-unique-prefix=uid8f5bf9866a496f15 -D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_intel_device_id__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_intel_device_info_uuid__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_intel_free_memory__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_intel_max_mem_bandwidth__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_intel_memory_bus_width__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_intel_memory_clock_rate__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_intel_pci_address__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_ballot_group__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_bindless_images__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_bindless_images_1d_usm__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_bindless_images_2d_usm__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_bindless_images_shared_usm__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_cubemap__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_cubemap_seamless_filtering__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_external_memory_import__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_external_semaphore_import__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_fixed_size_group__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_graph__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_image_array__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_limited_graph__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_mipmap__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_mipmap_anisotropy__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_mipmap_level_reference__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_native_assert__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_opportunistic_group__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_unique_addressing_per_dim__=1 -D__SYCL_ALL_DEVICES_HAVE_fp64__=1 -D__SYCL_ALL_DEVICES_HAVE_gpu__=1 -D__SYCL_ALL_DEVICES_HAVE_online_compiler__=1 -D__SYCL_ALL_DEVICES_HAVE_online_linker__=1 -D__SYCL_ALL_DEVICES_HAVE_queue_profiling__=1 -D__SYCL_ALL_DEVICES_HAVE_usm_atomic_host_allocations__=1 -D__SYCL_ALL_DEVICES_HAVE_usm_atomic_shared_allocations__=1 -D__SYCL_ALL_DEVICES_HAVE_usm_device_allocations__=1 -D__SYCL_ALL_DEVICES_HAVE_usm_host_allocations__=1 -D__SYCL_ALL_DEVICES_HAVE_usm_shared_allocations__=1 -D__SYCL_ALL_DEVICES_HAVE_usm_system_allocations__=1 -S -dumpdir /__w/llvm/llvm/build-e2e/bindless_images/examples/Output/example_1_1D_read_write.cpp.tmp.out- -disable-free -clear-ast-before-backend -main-file-name example_1_1D_read_write.cpp -mrelocation-model static -mframe-pointer=all -ffp-contract=on -fno-rounding-math -no-integrated-as -aux-target-cpu x86-64 -internal-isystem /__w/llvm/llvm/toolchain/bin/../include/sycl/stl_wrappers -internal-isystem /__w/llvm/llvm/toolchain/bin/../include -mlink-builtin-bitcode /__w/llvm/llvm/toolchain/lib/clang/20/../../clc/remangled-l64-signed_char.libspirv-nvptx64-nvidia-cuda.bc -mlink-builtin-bitcode /usr/local/cuda-12.6/nvvm/libdevice/libdevice.10.bc -target-sdk-version=12.6 -target-cpu sm_50 -target-feature +ptx85 -debugger-tuning=gdb -fno-dwarf-directory-asm -fdebug-compilation-dir=/__w/llvm/llvm/build-e2e/bindless_images/examples -resource-dir /__w/llvm/llvm/toolchain/lib/clang/20 -Werror -ferror-limit 19 -fgpu-rdc -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -o /tmp/lit-tmp-2a8clmef/example_1_1D_read_write-sm_50-e69480-a1d095.s -x ir /tmp/lit-tmp-2a8clmef/example_1_1D_read_write-sm_50-468b68_0.bc
# | 1.	Code generation
# | 2.	Running pass 'Function Pass Manager' on module '/tmp/lit-tmp-2a8clmef/example_1_1D_read_write-sm_50-468b68_0.bc'.
# | 3.	Running pass 'NVPTX Replace Image Handles' on function '@_ZTSN4sycl3_V16detail18RoundedRangeKernelINS0_4itemILi1ELb1EEELi1EZZ4mainENKUlRNS0_7handlerEE_clES6_EUlNS0_2idILi1EEEE_EE'
# | Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
# | 0  clang-20  0x000000000[26](https://github.com/intel/llvm/actions/runs/12563446905/job/35025604202#step:24:27)31d98 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 40
# | 1  clang-20  0x000000000262f84e llvm::sys::RunSignalHandlers() + 238
# | 2  clang-20  0x0000000002632428
# | 3  libc.so.6 0x00007fcf5ca6f320
# | 4  libc.so.6 0x00007fcf5cac8b1c pthread_kill + 284
# | 5  libc.so.6 0x00007fcf5ca6f26e gsignal + 30
# | 6  libc.so.6 0x00007fcf5ca528ff abort + 223
# | 7  clang-20  0x00000000025a939f
# | 8  clang-20  0x0000000000c96fb0
# | 9  clang-20  0x0000000000c94257
# | 10 clang-20  0x0000000001ad1197 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 695
# | 11 clang-20  0x00000000020[27](https://github.com/intel/llvm/actions/runs/12563446905/job/35025604202#step:24:28)aa5 llvm::FPPassManager::runOnFunction(llvm::Function&) + 629
# | 12 clang-20  0x000000000202fef2 llvm::FPPassManager::runOnModule(llvm::Module&) + 50
# | 13 clang-20  0x00000000020[28](https://github.com/intel/llvm/actions/runs/12563446905/job/35025604202#step:24:29)526 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1910
# | 14 clang-20  0x000000000284b0fe clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) + 7694
# | 15 clang-20  0x0000000002e1f25a clang::CodeGenAction::ExecuteAction() + 2698
# | 16 clang-20  0x000000000[30](https://github.com/intel/llvm/actions/runs/12563446905/job/35025604202#step:24:31)c859f clang::FrontendAction::Execute() + 95
# | 17 clang-20  0x000000000303583d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1149
# | 18 clang-20  0x000000000[31](https://github.com/intel/llvm/actions/runs/12563446905/job/35025604202#step:24:32)a0089 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 473
# | 19 clang-20  0x0000000000c12b7c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 6284
# | 20 clang-20  0x0000000000c0ecdf
# | 21 clang-20  0x0000000000c0ddfb clang_main(int, char**, llvm::ToolContext const&) + 5099
# | 22 clang-20  0x0000000000c1d087 main + 87
# | 23 libc.so.6 0x00007fcf5ca541ca
# | 24 libc.so.6 0x00007fcf5ca5428b __libc_start_main + 139
# | 25 clang-20  0x0000000000c0c705 _start + 37
# | llvm-foreach: Aborted (core dumped)
# | clang++: error: clang frontend command failed with exit code 254 (use -v to see invocation)
# | clang version 20.0.0git (https://github.com/intel/llvm 724f840609fce137a874dc6f1c846c3a3044b159)
# | Target: x86_64-unknown-linux-gnu
# | Thread model: posix
# | InstalledDir: /__w/llvm/llvm/toolchain/bin
# | Build config: +assertions
# | clang++: note: diagnostic msg: Error generating preprocessed source(s).
# `-----------------------------
# error: command failed with exit status: 1

--
@jsji jsji added bug Something isn't working cuda CUDA back-end pdtracker Pulldown tracker for issues/reverts that needs follow up labels Jan 2, 2025
@jsji
Copy link
Contributor Author

jsji commented Jan 2, 2025

FYI. @intel/bindless-images-reviewers

@ProGTX
Copy link
Contributor

ProGTX commented Jan 15, 2025

Thanks for reporting this, we're tracking this internally.

@JackAKirk
Copy link
Contributor

JackAKirk commented Feb 18, 2025

See here
#17045
for three failing minimal reproducers. In all cases failures occur when sycl::ext::oneapi::experimental::fetch_image is called under the following situations:

  • Parallel_for using a range instead of an nd_range (not recommended for GPUs) (tested with read_write_1D_range.cpp)
  • Any sycl::buffer usage breaks fetch_image (tested with read_write_1D_buffer.cpp )
  • In the case that a buffer with size greater than one is used, e.g. in https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/bindless_images/examples/example_2_2D_dynamic_read.cpp then simply removing the buffer does not fix the issue: if you replace the buffer with a C array the failure still occurs. I could not reproduce this with clang-cuda.

In all the above cases, the sycl compilation is failing to interact correctly with the new pass that has been switched on for all cuda targets: https://github.com/intel/llvm/blob/f9c8c01d38f8fbea81db99ab90b7d0f2bdcc8b4d/llvm/lib/Target/NVPTX/NVPTXReplaceImageHandles.cpp This is because the sycl compilation is preventing this switch statement from finding a valid instruction

switch (TexHandleDef.getOpcode()) {

I am not sure of the best way to resolve this, but it will probably involve considering the place of the NVPTXReplaceImageHandles.cpp pass in relation to other SYCL specific compilation processes, specifically dealing with the SYCL specific sycl::parallel_for/ sycl::range and sycl::buffer features.

dm-vodopyanov pushed a commit that referenced this issue Feb 20, 2025
…nctionality (#17045)

This separates out bindless-image tests into more unit-test like tests
for orthogonal functionality. This reduces the number of tests that fail
in #16503 when the upstream commit,
f9c8c01
, is pulled down, from 7 to three. This also demonstrates that
essentially all bindless-image functionality that maps directly to cuda
(those using nd range or USM) work correctly even with the upstream
pulldown, whereas some functionality that is SYCL specific (range
parallel_for or buffers) fail. Complete information is described in the
associated issue
#16503 (comment)

All the tests pass with the current DPC++ tip, but three of the tests
fail when the above mentioned upstream commit is pulled down.

- read_write_1D_buffer.cpp 
- read_write_1D_range.cpp
- examples/example_2_2D_dynamic_read.cpp

Additionally this PR removes the duplicate read_write_1D.cpp and
read_2D_dynamic.cpp tests which match almost identically with the
corresponding named tests in the examples folder. This is done to reduce
unnecessary maintenance overhead.

---------

Signed-off-by: JackAKirk <[email protected]>
kurapov-peter pushed a commit to kurapov-peter/llvm that referenced this issue Mar 5, 2025
…nctionality (intel#17045)

This separates out bindless-image tests into more unit-test like tests
for orthogonal functionality. This reduces the number of tests that fail
in intel#16503 when the upstream commit,
intel@f9c8c01
, is pulled down, from 7 to three. This also demonstrates that
essentially all bindless-image functionality that maps directly to cuda
(those using nd range or USM) work correctly even with the upstream
pulldown, whereas some functionality that is SYCL specific (range
parallel_for or buffers) fail. Complete information is described in the
associated issue
intel#16503 (comment)

All the tests pass with the current DPC++ tip, but three of the tests
fail when the above mentioned upstream commit is pulled down.

- read_write_1D_buffer.cpp 
- read_write_1D_range.cpp
- examples/example_2_2D_dynamic_read.cpp

Additionally this PR removes the duplicate read_write_1D.cpp and
read_2D_dynamic.cpp tests which match almost identically with the
corresponding named tests in the examples folder. This is done to reduce
unnecessary maintenance overhead.

---------

Signed-off-by: JackAKirk <[email protected]>
@JackAKirk
Copy link
Contributor

JackAKirk commented Mar 20, 2025

The root cause of this was that the pass that was switched on regardless of compute capability in the offending upstream commit did not support all valid NVPTX 64 bit address instructions that could be used to store an image handle.
This affected the sycl programming model via

  • LD_i64_asi that is used for the sycl::range parallel_for test failure
  • LD_i64_areg_64 for the sycl::buffer test failure

These failures have already been fixed via the following upstream patches (which have not yet been pulled down into intel/llvm):

@jsji in order to fix this in intel/llvm I guess that f9c8c01 needs to be cherry-picked before the next pulldown. I guess that you can either do this separately within a separate PR, or within the PR for the next pulldown (but prior to the later commits).

If you wish to cherry-pick the offending commit separately, then you can use the following temporary fix that will allow all bindless images tests to pass:

@@ -1809,13 +1810,16 @@ findIndexForHandle(MachineOperand &Op, MachineFunction &MF, unsigned &Idx) {
   MachineInstr &TexHandleDef = *MRI.getVRegDef(Op.getReg());
 
   switch (TexHandleDef.getOpcode()) {
+  case NVPTX::LD_i64_asi:
+  case NVPTX::LD_i64_areg_64:
   case NVPTX::LD_i64_avar: {
    // The handle is a parameter value being loaded, replace with the
    // parameter symbol
    const NVPTXTargetMachine &TM =
        static_cast<const NVPTXTargetMachine &>(MF.getTarget());
    if (TM.getDrvInterface() == NVPTX::CUDA) {
      // For CUDA, we preserve the param loads coming from function arguments
      return false;
    }

That you can revert that once the pulldown is made.

Does that make sense?

@jsji
Copy link
Contributor Author

jsji commented Mar 22, 2025

#17592

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end pdtracker Pulldown tracker for issues/reverts that needs follow up sycl-bindless-images SYCL Bindless Images
Projects
None yet
Development

No branches or pull requests

3 participants