Skip to content

[SYCL] Stop emission of modf intrinsic for NVPTX/AMDGCN #17958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: sycl
Choose a base branch
from

Conversation

frasercrmck
Copy link
Contributor

@frasercrmck frasercrmck commented Apr 10, 2025

A recent change in #126750 changed the lowering of the modf builtin(s) to lower to the llvm.modf intrinsic. Certain SYCL targets such as NVPTX and AMDGCN cannot currently handle this intrinsic in the backend because they don't have the necessary target library info. Even if they did, the SYCL device libraries are linked in earlier at the IR level so we'd be left with an unresolved symbol.

We have been working around this at clang codegen time by avoiding the emission of intrinsics altogether by (ab)using a guard on math intrinsics.

The problem with us hooking into GenerateIntrinsics like this is that SYCL is wanting to use it as a guarantee that (a certain subset of) intrinsics won't be emitted if GenerateIntrinsics is false, whereas upstream does not make this guarantee: it's an opt-in toggle for a more optimal lowering strategy. Thus we're always going to be liable to this sort of upstream change.

This doesn't feel like the right mechanism for SYCL to handle these builtins (or intrinsics) long term, but this fix restores the previous behaviour without making things much worse.

Fixes the AMDGCN/NVPTX aspects of #17813

A recent change in #126750 changed the lowering of the modf builtin(s)
to lower to the llvm.modf intrinsic. Certain SYCL targets such as NVPTX
and AMDGCN cannot currently handle this intrinsic in the backend because
they don't have the necessary target library info. Even if they did, the
SYCL device libraries are linked in earlier at the IR level so we'd be
left with an unresolved symbol.

We have been working around this at clang codegen time by avoiding the
emission of intrinsics altogether by (ab)using a guard on math
intrinsics.

The problem with us hooking into GenerateIntrinsics like this is that
SYCL is wanting to use it as a guarantee that (a certain subset of)
intrinsics won't be emitted if GenerateIntrinsics is false, whereas
upstream does not make this guarantee: it's an opt-in toggle for a more
optimal lowering strategy. Thus we're always going to be liable to this
sort of upstream change.

This doesn't feel like the right mechanism for SYCL to handle these
builtins (or intrinsics) long term, but this fix restores the previous
behaviour without making things much worse.

Fixes intel#17813
@bader
Copy link
Contributor

bader commented Apr 10, 2025

AMDGCN cannot currently handle this intrinsic

And still Matt Arsenault has approved this change. I suppose that change is safe for the clang (i.e. OpenCL/HIP/CUDA targeting AMDGCN).

If I get it right, this the trend in community to lower all math functions builtins into LLVM instrinsics. Long term we should align DPC++ with the community approach.

@frasercrmck
Copy link
Contributor Author

AMDGCN cannot currently handle this intrinsic

And still Matt Arsenault has approved this change. I suppose that change is safe for the clang (i.e. OpenCL/HIP/CUDA targeting AMDGCN).

If I get it right, this the trend in community to lower all math functions builtins into LLVM instrinsics. Long term we should align DPC++ with the community approach.

Yep I'll admit I don't fully know what's going on and what the intention and trend is. But we should try and get a handle on where things are heading and, yes, align with that approach.

I'd like to remove this workaround so we can at least align AMDGCN/NVPTX paths with the SPIR-V and other SYCL targets. If we are left with codegen intrinsics we could perhaps have an IR-level pass which expands intrinsics (with/without specific flags or ULP requirements) to libdevice/libclc function calls. I'm assume we are ruling out binary-level linking.

@frasercrmck
Copy link
Contributor Author

Tracker #17813 is quite coarse. It says CUDA in the title but also says the spirv backend is failing. It doesn't mention HIP.

Indeed with this patch, all the same tests are failing with the same problem, but only for the SPIR-V backend.

I was hoping to keep the tracker in place and retitle/reword it to reflect that fact that it's still failing for the SPIR-V backend, rather than closing it and opening another almost identical one. Is that okay?

@Fznamznon
Copy link
Contributor

I was hoping to keep the tracker in place and retitle/reword it to reflect that fact that it's still failing for the SPIR-V backend, rather than closing it and opening another almost identical one. Is that okay?

I'm not following . The original change broke NVPTX and AMDGCN, why SPIR-V backend is failing?

@frasercrmck
Copy link
Contributor Author

I was hoping to keep the tracker in place and retitle/reword it to reflect that fact that it's still failing for the SPIR-V backend, rather than closing it and opening another almost identical one. Is that okay?

I'm not following . The original change broke NVPTX and AMDGCN, why SPIR-V backend is failing?

The SPIR-V backend can't handle the llvm.modf intrinsic, so has been failing after that original change. My best guess is the NVPTX/AMDGCN failures were obscuring the SPIR-V backend failures, and the ticket isn't complete in its assessment. Note the original fix was to add UNSUPPORTED: true to the tests so they haven't been running for any target, and have been obscuring the SPIR-V backend issue. It's fundamentally the same issue though.

I've been made vaguely aware of changes in the closed source repo that align this GenerateIntrinsics code for the SPIR-V backend too. So I would imagine eventually the three targets will be aligned in some fashion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants