Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for "Passed Directly" Customization Point #1999

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
144 changes: 144 additions & 0 deletions rfcs/proposed/passed_directly_API/README.md
akukanov marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Passed Directly Customiation Point for User Defined Types

## Introduction

OneDPL handles some types of input data automatically as input to its dpcpp (sycl-based) backend as described
akukanov marked this conversation as resolved.
Show resolved Hide resolved
[here](https://uxlfoundation.github.io/oneDPL/parallel_api/pass_data_algorithms.html). Unified Shared Memory (USM)
pointers refer to data which is device accessible inherently, so no processing is required to pass this type of input
data to SYCL kernels, we refer to this trait as "passed directly". OneDPL also defines some rules for its provided
[iterator types](https://uxlfoundation.github.io/oneDPL/parallel_api/iterators.html) to be passed directly to SYCL
under some circumstances (based usually on their base types).

Internally, these rules are defined with a trait `oneapi::dpl::__ranges::is_passed_directly<T>` which evaluates to
`std::true_type` or `std::false_type` to indicate whether the type `T` should be passed directly to sycl kernels.
There exists a unofficial legacy `is_passed_directly` trait which types can define like this:
`using is_passed_directly = std::true_type;` which is supported within oneDPL. This method is currently used for a
number of helper types within the SYCLomatic compatibility headers, (`device_pointer`, `device_iterator`,
`tagged_pointer`, `constant_iterator`, `iterator_adaptor`). There is no official public API for users who want to
create their own types which could be passed directly to SYCL kernels, this is a gap we should fill in with an official
public API.
akukanov marked this conversation as resolved.
Show resolved Hide resolved

Without something like this users are forced to only rely upon our provided types, or reach into implementation details
which are not part of oneDPL's specified interface.

## Proposal

Create a customization point `oneapi::dpl::is_passed_directly_to_sycl_kernels` free function which allows users to
define to mark their types as passed directly:

```
template <typename T>
constexpr bool is_passed_directly_to_sycl_kernels(const T&);
```

oneDPL will provide a default implementation which will defer to the existing trait:

```
template <typename T>
constexpr
bool
is_passed_directly_to_sycl_kernels(const T&)
{
return oneapi::dpl::__ranges::is_passed_directly_v<T>;
}
```

Below is a simple example of a type and customization point definition which is always passed directly.

```
namespace user
{

struct my_passed_directly_type
{
/* unspecified user definition */
};

template <typename It1, typename It2>
constexpr
bool
is_passed_directly_to_sycl_kernels(const my_passed_directly_type&)
{
return true;
}
} //namespace user
```

Users can use any constexpr logic based on their type to determine if the type can be passed directly into a SYCL kernel
without any processing. Below is an example of a type which contains a pair of iterators, and should be treated as
passed directly if and only if both base iterators are also passed directly. OneDPL will use this customization point
internally when determining how to handle incoming data, picking up any user customizations in the process.
akukanov marked this conversation as resolved.
Show resolved Hide resolved

```
namespace user
{
template <typename It1, typename It2>
struct iterator_pair
{
It1 first;
It2 second;
};

template <typename It1, typename It2>
constexpr
bool
is_passed_directly_to_sycl_kernels(const iterator_pair<It1, It2>& pair)
{
return oneapi::dpl::is_passed_directly_to_sycl_kernels(pair.first) &&
oneapi::dpl::is_passed_directly_to_sycl_kernels(pair.second);
}
} //namespace user
```

This allows the user to provide rules for their types next to their implementation, without cluttering the
implementation of the type itself with extra typedefs, etc.

This option can exist in concert with existing methods, the legacy `is_passed_directly` typedef in types, the internal
`oneapi::dpl::__ranges::is_passed_directly` trait specializations. It would be possible to simplify the internal
implementation away from explicit specializations of the trait to the customization point, but that is not required
at first implementation.

### Implementation details
To make this robust, we will follow an C++17 updated version of what is discussed in
[Eric Niebler's Post](https://ericniebler.com/2014/10/21/customization-point-design-in-c11-and-beyond/), using a
callable, and using an `inline constexpr` to avoid issues with ODR and to avoid issues with resolving customization
points when not separating the call to two steps with a `using` statement first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know why you refer to the Niebler's post. He describes there the C++20 customization point objects, more or less; but that's not what you propose to do. As far as I understand, the proposed user-defined customizations will be ADL discoverable (if not, then I do not understand how to use those) - and then you need the using statement to get the default implementation in oneapi::dpl, which makes exactly the two-step customization, does not it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I am mistaken, but my understanding is that Niebler is describing a way using function pointers to allow qualified calls to also pick up the default implementation rather than just unqualified calls with the using statement.
We don't need his more complex strategy for ODR because of changes in C++17, but I think the function pointer strategy is still a benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I am mistaken, but my understanding is that Niebler is describing a way using function pointers to allow qualified calls to also pick up the default implementation rather than just unqualified calls with the using statement.

See https://godbolt.org/z/TM914E6fv for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up and added my proof of concept to the description as well for another example.

Copy link
Contributor

@akukanov akukanov Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that Niebler is describing a way using function pointers to allow qualified calls to also pick up the default implementation rather than just unqualified calls with the using statement.

I do not see where he uses any function pointers. His std::begin is a reference to an instance of struct std::__detail::__begin_fn which has a function call operator, so that std::begin(X) is a valid code. This operator internally uses an unqualified call to begin, which "default" implementation is in std::__detail and specializations found by ADL. As I said, it's more or less matches the CPO design in C++20.

And it's not what you proposed so far, as far as I can see. In this proposal, the default implementation is a free function in the oneapi::dpl namespace, and customization is a free function in the user's type namespace (it is not said explicitly that the namespace should be the same, but de-facto it should, for ADL to work). So I do not see how a qualified call will take customizations, neither how an unqualified call will take the default implementation without a using declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to improve the language to be more specific and accurate about the proposed implementation details, but I think what you describe in your first paragraph is correct and accurate to my intentions.

You are correct that an unqualified call does require a using declaration, but a qualified call should take the user customizations because a qualified call will use the function object. The function object internally makes a unqualified call from the namespace of the default implementation on behalf of the user, allowing it to either find the more specific user customization if it exists, or end up in the default otherwise. You should be able to see this tested in the proof of concept here on line 121.

Hopefully I'm not missing something, but I may be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I was wrong to mention function pointers, I meant function objects)


## Alternatives considered
### Public trait struct explicit specialization
We could simply make public our internal structure `oneapi::dpl::__ranges::is_passed_directly` as
`oneapi::dpl::is_passed_directly` for users to specialize to define rules for their types. This would be a similar
mechanism to `sycl::is_device_copyable`. The implementation details of this option should avoid some complexity required
to properly implement the customization point.

However, as we have learned from experience within oneDPL, explicit specialization of a structure in another library's
namespace makes for maintenance problems. It either requires lots of closing of nested namespaces, opening of the
external library's namespace for the specialization or it requires separating these specializations to a separate
location removed from the types they are specializing for. OneDPL has chosen to use the later, which can be seen in
`include/oneapi/dpl/pstl/hetero/dpcpp/sycl_traits.h`. This has made for several errors where changes to structures
should have included changes to sycl_traits, but did not, and needed to be fixed later.

In an effort to avoid this same issue for our users, we propose a similar method but instead with a constexpr
customization point, allowing the user to override that customization point within their own namespace as a free
function.

### Require specifically named typedef / using in user's type
We could simply make official our requirements for user's types to include a typedef or using statement to define if the
type is passed directly like `using is_passed_directly = std::true_type;`, where the absence of this would be equivalent
to a `std::false_type`.

However, this clutters the user type definitions with specifics of oneDPL. It also may not be as clear what this
signifies for maintenance of user code without appropriate comments describing the details of oneDPL and SYCL. Users
have expressed that this is undesirable.

### Testing
akukanov marked this conversation as resolved.
Show resolved Hide resolved
We will need a detailed test checking both positive and negative responses to `is_passed_directly_to_sycl_kernels` come
as expected, with custom types and combinations of iterators, usm pointers etc.

## Open Questions

Is there a better / more concise name than `is_passed_directly_to_sycl_kernels` we can use which properly conveys the
meaning to the users?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the name somewhere includes onedpl. The function is not generally relevant for SYCL. Maybe

  • is_passed_directly_to_onedpl, or
  • is_passed_directly_to_onedpl_kernels

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its a fair point that since the customization point in user's code wont be in the oneapi::dpl namespace, there is little connecting it to oneDPL unless it is in the name. I was trying to be more descriptive about the semantic meaning as this is only relevant for the SYCL based dpcpp backend however that is probably less important than including oneDPL.

Thanks, I think the first of your suggestions is probably the best option so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this property only really makes sense for the SYCL backend of oneDPL, there is a sense that is_passed_directly_to_onedpl doesn't quite tell the full story of what it is for. It won't be used in the TBB or OpenMP backends of oneDPL, which you wouldn't understand from just the name alone.

On the other hand, something like is_passed_directly_to_sycl is too broad and could be confusing because the SYCL implementation won't use the function directly. Perhaps the second suggestion of is_passed_directly_to_onedpl_kernels is closer, but one could take issue with its verbosity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps is_passed_directly_to_onedpl_dpcpp is the right choice (I hate how wordy it is but I'm not sure we have a choice).
A normal user of oneDPL may better recognize "dpcpp" than "kernels", saves a couple characters too.

Copy link
Contributor

@akukanov akukanov Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actially I would prefer not to introduce more names with dpcpp, to avoid the impression that our implementation is based on DPC++ (and not on SYCL specification).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer moving away from the oneDPL implementation detail (of "passing directly") towards the semantical meaning of the trait, something like "this iterator supports implicit data transfer" or (inverting the value) "requires explicit data transfer" or maybe "is ready for use with/suitable for oneDPL device policies".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like requires_explicit_data_transfer_onedpl_device_policies with inverted values better than talking about implicit data transfer, because passed directly is more about not needing transfer in the first place. Implicit data transfer is appropriate for buffer accessors or shared USM, but not all "passed directly" types. USM device pointers or counting_iterators are passed directly, but don't have any data transfer, implicit or explicit.

Another option: is_dereferencable_in_onedpl_device_policies?

Copy link
Contributor

@rarutyun rarutyun Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (honestly) would say is_passed_directly_to_sycl_backend. We use 'backend' term in our documentation, so it should be clear enough. Please don't repeat _onedpl_ in the name, because it's already in oneapi::dpl namespace. Duplication doesn't bring clarity. I am 100% agree with the intent to not use dpcpp in the name. The rarer we use it, the better. But this is a comment about the name. I have some amount of questions to the approach itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some development from the original idea, the only public API is the trait which will include the oneapi::dpl namespace. However, users will still be overriding the customization point by defining a function in their own namespace, and I believe that name should be associated with the trait (its name + _v). Without repeating onedpl in the name of the customization point, there will be nothing tying it to oneDPL in the users code. This is the motivation for including _onedpl_.

We could consider different names for the customization point and the trait, but that may also be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use the word 'backend" in the public API (actually, not at all in the specification), and I think we should not.

There are two elements of the API to name - the trait value (also a class?) and the user-defined customization function. Their names should be related, but I am not sure if almost exact match is needed. I agree that the name of the function should refer to oneDPL; that can be achieved by adding a prefix to the function name.

Also, as far as I understand, the trait is not so much (if at all) about the iterator itself "passed" to a device, as it is about the data "underneath" the iterator being accessible from that device, so that no data copying/no intermediate buffer is required.

Can it be something like onedpl_is_iterator_device_ready() for the function and oneapi::dpl::is_iterator_device_ready[_v] for the trait?


Should we be targeting Experimental or fully supported with this proposal?
(Do we think user feedback is required to solidify an interface / experience?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the general design of passed directly has been tested internally pretty well at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. From my perspective the only reason to keep it in experimental to start with is if we are uncertain of the exact API specifics or to find any unexpected gotchas with the approach of using a customization point generally as opposed to some other option.

Lets see what others have to say, but I'm leaning toward targeting supported, and just adding it to the specification directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is not much need for an experimental phase. But a POC with practical usage outside of oneDPL is necessary I think.

Loading