Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we prefer protocols in type stubs? #899

Closed
NeilGirdhar opened this issue Feb 6, 2025 · 10 comments
Closed

Why do we prefer protocols in type stubs? #899

NeilGirdhar opened this issue Feb 6, 2025 · 10 comments

Comments

@NeilGirdhar
Copy link

This is a continuation of a long conversation from #589.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 6, 2025

@jorenham writes:

Inheritance hurts performance, and the additional import that this would require also would. Array libraries are usually heavily optimized, so this is not a realistic ask. And I'm not sure whether it would even be possible in case of e.g. numpy.ndarray, which is written in C

Isn't the performance impact of an extra class in the MRO extremely minor, especially since:

  • nearly all called methods will be found on the array itself, and
  • many array libraries support jitting (which means that the calls are a one-time cost)
    Generally, code should be written to be intuitive rather than squeezing out a few nanoseconds.

But if you insist that every nanosecond counts, you can inherit without any performance impact by doing:

if TYPE_CHECKING:
  class SomeClientArray(..., Array)
else:
  class SomeClientArray(...)

would even be possible in case of e.g. numpy.ndarray, which is written in C

I imagine that it's possible to add a base class, but even if not, they only need to add it in their type stubs.

@jorenham
Copy link

jorenham commented Feb 6, 2025

from #589 (comment)

It's not performance, it's that a required dependency is simply unacceptable for most libraries. NumPy has zero runtime dependencies, and it's highly unlikely that any will be added over the next few years - and this kind of thing certainly doesn't meet the bar. Other libraries may have similar constraints, plus there's potential issues with requiring different minimum versions of the same dependency.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 6, 2025

@rgommers writes

It's not performance, it's that a required dependency is simply unacceptable for most libraries. NumPy has zero runtime dependencies, and it's highly unlikely that any will be added over the next few years - and this kind of thing certainly doesn't meet the bar. Other libraries may have similar constraints, plus there's potential issues with requiring different minimum versions of the same dependency.

Doesn't having one set of stubs create more problems?

At some point, you will eventually make an incompatible change to the methods on Array. Let's name these "stubs 1.0" and "stubs 2.0" after the change. And consider we have various "API clients" compatible with each Jax 1.0 and Jax 2.0, and a library compatible with each Library 1.0 and Library 2.0.

Suppose only Library 1.0 exists, and a user wants to use it. They need to import the correct Stubs 1.0, and they need to hold back Jax to the correct version 1.0.

And vice versa: suppose that they want to use Jax 2.0, then they need to constrain on Stubs 2.0 and Library 2.0.

In short, I think you can't sidestep that somewhere you will need to make the Array API stub version a dependency.

As for why the situation is different with Numpy: Numpy gets away with not having a stub dependency because it includes the stubs itself. A library written for the Array API doesn't have this luxury because it may not even depend on any clients (Jax, etc.) nor NumPy. It can have no dependencies. But nevertheless, that library does depend on some stub version (1.0 or 2.0), and user code needs to know that in order to depend on the right versions of Jax.


This argument is based on the idea that you will eventually make some change to the interface (adding a method and so on). Is it realistic to suggest that you will never change any stubs forever?

@NeilGirdhar
Copy link
Author

You'll also get a type-error in this case if somewhere in your codebase you try to assign your array type to the array protocol, e.g.

You'll also get a type-error in this case if somewhere in your codebase you try to assign your array type to the array protocol, e.g.

I don't think that works since the user would have to do the check. Is every user of the Array API supposed to add a type_assert into their code? I don't think that's reasonable. (That type assert will fail if you use Stubs 2.0 with Jax 1.0, which is why the user needs to do it—he's the one who imported the incompatible versions.)

@jorenham
Copy link

jorenham commented Feb 6, 2025

Are you talking about https://github.com/data-apis/array-api-typing/ here, or about the "stubs" in https://github.com/data-apis/array-api/tree/main/src/array_api_stubs, which are mostly meant for documentation?

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 6, 2025

@ntessore wrote:

Isn't this exactly the same either way?

With inheritance, the inheriting class (JaxArray(Array)) will raise a type error at definition time).

The only difference being an assert_type(MyArray, ArrayApiArray)?

The user would have to do the assert_type. I think it's very weird to expect users to verify that the libraries they've imported are compatible.

it allows the user to take responsibility when using an implementation that is not (yet) fully compliant, instead of putting it solely in the hands of the implementation to declare that it is compliant.

If it's not yet fully compliant, then the stubs will simply be invisible to them since the protocol won't match. I don't see why anyone would release (e.g., Jax) stubs before they're compliant.

It also means implementations can gradually transition to becoming Array API compliant, and they get told exactly where they currently aren't.

You get told where you aren't compliant by type errors. And I guess I still don't see why you would want to gradually become compliant rather than just releasing stubs when you're done.

@NeilGirdhar
Copy link
Author

Are you talking about https://github.com/data-apis/array-api-typing/ here, or about the "stubs" in https://github.com/data-apis/array-api/tree/main/src/array_api_stubs, which are mostly meant for documentation?

I mean the stubs in the pull request that I linked.

@jorenham
Copy link

jorenham commented Feb 6, 2025

Are you talking about data-apis/array-api-typing here, or about the "stubs" in main/src/array_api_stubs, which are mostly meant for documentation?

I mean the stubs in the pull request that I linked.

Those won't be published, and are primarily there for the generated sphinx docs (#857 (comment)), see #857 and #863

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 6, 2025

Those won't be published, and are primarily there for the generated sphinx docs,

Oh! My mistake. I'll follow #863 with interest 😄

Shall I close this issue then, or is there anything left to say? (I'll close it in a couple days if no more discussion occurs.)

@jorenham
Copy link

jorenham commented Feb 6, 2025

Oh! My mistake. I'll follow #863 with interest 😄

No worries; I've also made that mistake in the past.

Those won't be published, and are primarily there for the generated sphinx docs,

Oh! My mistake. I'll follow #863 with interest 😄

Shall I close this issue then, or is there anything left to say? (I'll close it in a couple days if no more discussion occurs.)

In case anyone wants to talk about this more, then we should probably do that at https://github.com/data-apis/array-api-typing/, so you can close this as far as I'm concerned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants