-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future ABI compatibility #34
Comments
I feel that we could write additional information in managed_ctx. For example, when borrowing numpy ndarrays, we could save a pointer to a Python object inside the managed_ctx. As long as two framework have protocol about how to interpret the managed_ctx, it will be fine |
Has this been discussed in more detail elsewhere? The lack of a version attribute inside one of the structs is indeed odd, which came up in the conversation around NumPy adopting DLPack. There's |
I agree, given that we are ABI compatible so far this isn't an issue so far. How about we allow attaching version information in the capsule for now, this way we do not need to update the struct while still allows libraries to check it. |
That seems good, cause breaking ABI for it would be painful. Do you mean in the # return version as second argument, `(major, minor, micro)`
def __dlpack__(self, stream=None) -> Tuple[PyCapsule, Tuple[int, int, int]]: |
we can directly attach it as a version attribute of the PyCapsule |
It seems >>> handle.version = 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'PyCapsule' object has no attribute 'version' This can happen as |
I see, for backward compat reason with the previously defined protocol, then perhaps dlpack_version is a better way to handle this |
I'd prefer to avoid a third attribute at the Python level if possible. The two options I suggested still seem feasible:
The first one may be easier to implement (parsing version strings in C is a pain, it typically breaks for a naive implementation once you hit |
I'd prefer this one if at all possible. Also, we should have an ABI version along with a library version. |
That's actually the only thing we really care about, right? |
I'd think so, but currently the version is |
One advantage of doing this though is that it's more flexible - it can be checked before actually calling DLPack. Especially when streams are involved, it's possible that a consumer calls Another small issue is that TVM (and perhaps some other library?) already implemented support. So I'm coming around to maybe using Would be nice to get some more opinions. @leofang, @emcastillo, @kkraus14? |
Why are we planning on handling this at the Python level? DLPack is a C header and is usable from C libraries as well. If we solve the problem for C libraries it kinda naturally solves the problem for Python as well. Can we please avoid Python only solutions? |
The concern is breaking the ABI, you can't just add a field. Can you think of a way to do it in C in a reasonable manner @kkraus14? |
One thing I do not quite like about But, I admit it should be constant for a single tensor/array (although not unimaginable that it is not). Putting it in the DLPackManagedTensor seems nicest... But yeah, the struct can only be extended if we are OK to crash when the consumer needs it, but the producer is too old. On the up-side, it might be years until most consumers even have a reason to read it, and within the Python ecosystem we could probably standardize quickly on a recent enough DLPack?) Otherwise, maybe a new In a sense that would add |
It's a 0.x project so we could just break the ABI 😄. Otherwise we'd need to do something similar to what we did with the stream on the Python side and pass it out of band. |
@rgommers This is not OK. I think we missed to specify in the Array API standard that a live capsule must have the name "dltensor", which is in fact part of the DLPack protocol since its Day 1. (We did say a dead one is named "used_dltensor".) Capsules must be looked up by name, so we cannot have the name varying across libraries. I can push a PR to array-api for clarification.
This makes sense to me, although let's not forget the current exchange protocol is fully complete only in Python (via I feel it's about time for us to revisit @kkraus14's request #65 --- maybe it is sufficient to "backport" all the Python components to C? Then, we can create a new C function, which could be, say, A separate question is if returning |
I believe we can just put ABI breaks in a major version if needed, but IMHO it's best to have a separate version for feature support and ABI. |
I think both matter. For example, when DLPack was upgraded in PyTorch from 0.2 to 0.4, the question was if that was ABI-compatible. Without a separate ABI version, the only way to know would be to look into the DLPack commit history, or ask someone. It gets worse when you receive a version from another library that is higher than what you implement in your own library - should you raise an exception or not? There's no way to know if that newer version is ABI-compatible or not. Unless you require that any ABI break increases major version indeed. But there's other reasons to increase major version number, so a separate ABI version seems nicer. |
This makes sense I think - although coupling version info and stream support isn't necessary, those can be done independently. Right now we don't have a pressing need for other new fields, adding version info is "future proofing". Making breaking changes just to future-proof things seems a bit gratuitous. So a new function sounds good. It did bring up another question for me. Are there examples of libraries adding |
I think it would be great to add version info to DLPack. There have been some requests for ABI breaking changes (data-apis/array-api#191, #97). Unless versioning is done, each ABI breaking change will require everyone to update.
Wouldn't this cause symbol conflicts (since |
I will bring up an alternative once more: Add "request flags" API. That is, you add a new kwarg to
As soon as you really want the exporter to pass out additional information, you would add that first "required" flag. We would assume that 32 flags is plenty for the forsee-able future, most likely there will be very few flags, but nobody will have to worry about the how to add something. The transition will require a The above is of course only for the Python API, I think it makes sense for a C-API/ABI, but I don't think the C part has a defined exchange protocol anyway. |
I think there are three options that should future proof DLPack against ABI breaks/changes in the future at Python level: Option 1Return a 3-tuple from Option 2Introduce a One problem with this: In the future, when we break the ABI, let's say by adding the readonly flag, previous versions of the Python libraries that supported One solution is to add the ABI version in the capsule name e.g. Option 3Implement the requests API as @seberg suggested. I think this is more work than the above two options. I vote for the first option since it is the easiest to do. For the C API, adding a function that returns a API/ABI version and specifying it in the C spec seems like the best option since DLPack had no way before to guard C applications against ABI breaks. Maybe we can proceed with #72 to address this problem at C level. @tqchen @seberg @leofang @rgommers @mattip Please let me know which options sounds good to you. We can update the spec with whatever options has consensus. |
I would really like to nudge towards doing the annoying thing now. I.e. introduce a new C struct (with version and flag-space at the beginning as the initial poster also suggested). And then rename the capsule for safety. Why? Many small reasons:
Now, I could see this not replacing the current |
@rgommers But adding it to the end of DLTensor will change the size of the first field of
@seberg I agree with the first suggestion about introducing a new C struct with an API and ABI version field and adding it to the The This can help transition Python libs smoothly from the old ABI to the new one (the libs that want to support both can do so by checking if
If this sounds good to you @seberg @rgommers @mattip @tqchen, I will propose a PR for the first three and then the last one if required. [1] These checks could be a problem if we want to remove |
This remains to be seen. If the contents of the dlpack struct are not compatible with what is expected with the version in NumPy 1.22 (and comparable versions of other libraries), it must error out when loading this incompatible struct. |
Thanks for all the discussions, I agree that Let us deliberate where/when/where do add version info DLTensor, and path for upgrading. The nature of standard does mean that we favor stability and compatibility in current supporting frameworks. |
Great discussion. I like |
This seems the important part to me. Once it is settled how the C ABI/API will evolve, the Python side As Matti also allured to: |
Where can we add version info?The version info can be added at the beginning of the typedef struct {
/*!
* \brief The DLPack and ABI versions of the tensor. The exporting library
* should set the DLPack version to DLPACK_VERSION and ABI version to
* DLPACK_ABI_VERSION.
*/
struct {
uint8_t dlpack;
uint8_t abi;
} version;
...
// The new fields go here (at the end of the struct)
/*! \brief Mark the data readonly. */
uint8_t readonly;
/*!
* \brief Endianness of the data. 1 for non-native endianness and
* 0 for native endianness.
*/
uint8_t endianness;
/*! \brief Alignment of the data. */
uint32_t alignment;
} DLTensor;
typedef struct DLManagedTensor {
DLTensor dl_tensor;
void * manager_ctx;
void (*deleter)(struct DLManagedTensor * self);
} DLManagedTensor; If we want to add more fields in the future, we can do so by appending them at the end of the struct (this doesn't affect the offset of the Upgrade PolicyNote that After adding version info to the
(The version keyword mechanism is required to preserve backward compatibility with Python libraries exporting capsules that use the old ABI.) I think this concretely defines a roadmap to upgrade. If this sounds good, we can start with merging #101 and then upgrading the Python implementations according to the new spec. |
Copying my #101 (comment) back here because I don't understand why out of sudden we are now talking about removing
|
Expanding on this: In this case the producer most likely would raise |
The consumer can theoretically ask for non-existent version combinations but I didn't think someone would do that since we don't let the user call
I think this point is valid. IIUC, the idea is that the consumer can ask for a version lower than the max-supported one returned by
IIUC it's equivalent to put the burden on either side. Sorry but I didn't understand what "since they [producers] know the best" exactly means. What do the producers know better than consumers? |
Personally, I think that should happen, because it seems to me we should expect backwards incompatible breaks eventually if not now (Both gh-97, and gh-41 at least play with the idea of incompatible changes?). "Breaking" things now gives a clear path for doing future incompatible changes. I do not quite see that From a Python API point of view, it seems to me that adding However, I have no idea what the plan for ABI breakage is in C. My suggestion right now would be to create a new
I don't really think it matters. If the consumer asks for clearly invalid things an error must be raised, but I am not even sure a The |
Sorry I still am not convinced. First to correct, by "user" here I meant consumer, which should be obvious from context. I was saying even for consumers it's not ok for them to guess it with a N^2 for loop (for two entries): Which cap version tuple should they start trying downward? If the producer only supports a version tuple above the consumer's top try, the export would fail (but it could be that the particular version tuple is supported by the consumer as well, just we're not forgiving enough for the consumer to know; UPDATE or, it should really just fail but why are we paying the loop cost to fail slow?).
This is an odd question... As a producer I know everything about what I'm exporting and can export... Why would the consumer know better than the producer? With the producer providing the info it's one shot: either the ABI version on both sides matches, or it doesn't. Am I missing anything? (We need to check the ABI version first in order to determine the struct layout for subsequent interpretation.)
Splitting up into two calls here is similar to why we had Please note: I am not objecting to adding the version info as new C structs (or struct members, I don't really care). In fact that's good when we move on to standardize C APIs for DLPack in addition to Python APIs (see the meta issue #74). Let's focus on the Python handshake convention, OK?
Adding |
I am not sure I follow, so let us just take a step back. Can we agree that there are exactly two relevant versions:
and that a successful exchange must use Using
I admit, I missed the second as a serious alternative. I still prefer the first a bit, because it allows the consumer to confirm that the right version was passed back without any need to trust the producer or intermediate Python code. I do not think it matters that The one advantage of In the end, maybe what matters is again how you want to move the C-API forward. Assuming the ABI is broken in C: would you want to add the version into the new |
Could you point to where that was settled? I encouraged Tirth to change the capsule name when adding fields to the struct, as it seems the only way to continue to use |
I admit I am confused where we want to go with the interaction between consumers and producers:
Perhaps we should try to arrange a meeting with all interested parties, say next week, to try to resolve the open questions. |
Ugh, I missed this - and in my mind
No, no one has suggested anything like this I think. Exposing the version to end users doesn't make sense. This should be private info that's passed between libraries.
This is a critical point, and should be worked out in more detail. It's something like:
I think it's worth emphasing again that DLPack is used fairly heavily in production code, so old-style code shouldn't break.
No, but there are many DLPack versions with the same ABI version. Like right now, DLPack v0.2 to v0.6 are all ABI-compatible, and should work together just fine if libraries are on different version. How would you know without checking both?
Some synthesis work and writing that down seems needed first, I don't think a call is going to resolve anything without that. |
I agree that a synthesis would be great. Maybe a meeting will even be unnecessary. Or maybe plan a meeting and someone prepares a rough summary as a basis? (EDIT: ideally summary + laying out alternatives discussed)
Ideally, the breaking change should be designed in a way that it is OK if a library fails to update (either producer or consumer). Of course it is OK for such a library to just cause an error eventually or even very soon. (But it is not nice if a library that fails to update can run into ABI issues.)
Hmmm, I agree we do not need both a major and minor version. But, it may make "more common" changes like adding additional information at the end or adding new datatypes/devices easier. |
This is how I understand it:
Is that right? What I missed before is that |
Yeah, I think that is one good option that ensures full backcompat. And it has the nice part that The alternative I see right now is the following:
I like the fact that this is not "out of band" (as Keith said early on). The version is included in the capsule so it can be verified in C and is exchanged in a single Python "call" (fetching However, I do feel it seems much more natural, if we create a new struct in C, which includes the version info as a first field. (But I still somewhat expect that will happen?) |
@seberg We don't need to change the capsule name. The except block in your example will always return a capsule with the current ABI. A name change might provide more protection against crashes but isn't necessary. I have (roughly) implemented something for numpy here for reference.
Yes, I too think adding version info in the new struct will be better (libraries that want to support multiple ABI versions will need to do that anyway). But once we have a new struct with version info, we can keep appending new fields to it in case of an ABI break in the future. (perhaps, we can also remove the old version after a few years) |
Right, I like the extra layer of protection against a wrong capsule being passed out accidentally (the capsule has full version information). And I feel there is not much of a downside, if that name reflects a struct name change/addition in C. |
Makes sense. I think @leofang was against a name change (if everyone follows the spec, it ideally shouldn't be required). @leofang Did you have anything in mind that would break if we change the capsule name (conditionally i.e. we still export a capsule named |
Sorry for getting back late guys. Still trying to digest... 😅
Again, that's not true. The usage of the names If we are talking about immediate breaks to move forward faster, then yes we can do it intentionally to signal that we're breaking things (for the better). Then, I don't think it's a bad idea to name a capsule like
I honestly haven't thought about it seriously enough, and I now see that it'd be hard to support multiple ABI versions in practice, at least unlikely with NumPy/CuPy/PyTorch? (@seberg @kmaehashi @rgommers to chime in.) It means we need to build multiple Python modules, and in each module's source code we include a different DLPack header with different ABI. I feel if we're breaking it we should consider doing it collectively, forget about supporting multiple ABIs, and don't look back. |
Nobody suggests to break the contract. Renaming the capsule rather amends the contract. Any current user will still get the old name and can still use the old name (for the time being). You just won't be able to write:
But that is not intended use of
I don't think it is particularly hard. Yes, it will require additional code paths in all libraries that try to be friendly to users. But, that is unavoidable if we want to support multiple ABI versions. And supporting multiple ABI versions seems unavoidable if we want users to not notice transitions when ABI (or incompatible API) changes happen. In practice, code duplication can hopefully be mostly avoided using templating or a helper. But even if not, the good thing is that we can deprecate old versions after 1 year (or even just remove support). So the duplicated code has a clear "expiry date" and that makes the maintenance burden less bad.
For NumPy, I don't care. |
related discussion #104 |
DLPack’s vision of a common in-memory tensor format that spans device and memory types is fantastic.
However, in its form, there is no upgrade path for adding new items to the either the DLTensor or DLManagedTensor structs in way that would maintain ABI compatibility.
I would like to propose the addition of two components to the DLTensor struct. This will break current ABI compatibility, but will in the long run future proof the design.
Additions to DLTensor
unt8/16_t version
uint64_t future_bytes
Adding the version allows the receiving library to determine if the DLTensor can be consumed. The receiver may not have a matching version, but as long as it knows the version it can make a decision on if the data can be correctly used.
Adding
future_bytes
allows for the addition of new options to DLTensor. One of which might be data layout, ie row-major or column major (c-format vs FORTRAN). I will open a separate issue for this feature.The text was updated successfully, but these errors were encountered: