Skip to content

Commit bc99292

Browse files
authored
Add post announcing the v2023 Array API Standard release
PR-URL: #26
2 parents a053a70 + 15ce912 commit bc99292

File tree

1 file changed

+275
-0
lines changed

1 file changed

+275
-0
lines changed
+275
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
+++
2+
date = "2024-04-08T08:00:00+00:00"
3+
author = "Athan Reines"
4+
title = "2023 release of the Array API Standard"
5+
tags = ["APIs", "standard", "consortium", "arrays", "community"]
6+
categories = ["Consortium", "Standardization"]
7+
description = "The 2023 revision of the array API standard has been finalized and is ready for adoption by conforming array libraries."
8+
draft = false
9+
weight = 30
10+
+++
11+
12+
Another year, another revision of the Array API Standard! We're proud to
13+
announce the release of the 2023 revision of the Array API Standard. As was the
14+
case for [2022 revision](https://data-apis.org/blog/array_api_v2022_release/),
15+
this release required extensive discussion and collaboration among array
16+
libraries and their downstream stakeholders as we continued reaching consensus
17+
on unified API design and behavior. We're particularly excited to share that
18+
this year marked a significant milestone in our efforts to facilitate array
19+
interoperation within the PyData ecosystem, as we witnessed accelerated
20+
adoption of the standard, especially among downstream libraries, such as
21+
[SciPy[(https://docs.scipy.org/doc/scipy//dev/api-dev/array_api.html)] and
22+
[scikit-learn](https://scikit-learn.org/stable/modules/array_api.html).
23+
24+
## Brief Background
25+
26+
For those who are not yet familiar with the Consortium and the Array API
27+
Standard, a bit of background. Our aim is to standardize the fundamental
28+
building blocks of scientific computation: multi-dimensional arrays (a.k.a.
29+
tensors). The PyData ecosystem has a rich set of libraries for working with
30+
arrays, including NumPy, CuPy, Dask, PyTorch, JAX, TensorFlow, oneAPI, and
31+
beyond. Historically, interoperation among array libraries has been challenging
32+
due to divergent API designs and subtle variation in behavior such that code
33+
written for one array library cannot be readily ported to another array
34+
library. To address these challenges, the [Consortium for Python Data API
35+
Standards](https://data-apis.org/blog/announcing_the_consortium/) was
36+
established to facilitate coordination among array and dataframe library
37+
maintainers, sponsoring organizations, and key stakeholders and to provide a
38+
transparent and inclusive process--with input from the broader Python
39+
community--for standardizing array API design.
40+
41+
Soon after formation of the Consortium in May 2020, we released an [initial
42+
draft](https://data-apis.org/blog/array_api_standard_release/) of the array API
43+
specification and sought input from the broader PyData ecosystem during an
44+
extended community review period. Throughout 2021, we engaged in a tight
45+
feedback loop with array API adopters to refine and improve the initial draft
46+
specification.
47+
48+
During this time, we reached three key milestones. First, we introduced a data
49+
interchange protocol based on [DLPack](https://github.com/dmlc/dlpack) to
50+
facilitate zero-copy memory exchange between array libraries. Second, we
51+
standardized a core set of API designs for array creation, mutation, and
52+
element-wise computation. Third, we introduced "extensions", which are defined
53+
as coherent sets of functionality that are commonly implemented across array
54+
libraries, but which conforming array libraries may choose not to implement.
55+
The first extension we included in the specification was the `linalg`
56+
extension, which defines a set of linear algebra APIs for computing
57+
eigenvalues, performing singular value decomposition, solving a system of
58+
linear equations, and other linear algebra operations.
59+
60+
Building on the success of the 2021 revision of the Array API Standard, we
61+
worked throughout 2022 on a subsequent specification revision with two key
62+
objectives: standardize complex number support and standardize an extension for
63+
Fast Fourier Transforms (FFTs). These efforts culminated in the [2022
64+
revision](https://data-apis.org/blog/array_api_v2022_release/) of the Array API
65+
Standard, along with significant advancements in tooling to support
66+
specification adoption. Importantly, we released 1) a comprehensive portable
67+
[test suite](https://github.com/data-apis/array-api-tests) built on Pytest and
68+
Hypothesis for testing Array API Standard compliance and 2) an [array
69+
compatibility layer](https://github.com/data-apis/array-api-compat) which
70+
provides a small wrapper around existing array libraries to ensure Array API
71+
Standard compliant behavior.
72+
73+
With the 2022 revision out of the way, we summarized our work to date,
74+
publishing in _SciPy Proceedings_ the paper ["Python Array API Standard: Toward
75+
Array Interoperability in the Scientific Python Ecosystem"](https://proceedings.scipy.org/articles/018d8c34-e9ca-7105-9366-a050cc18b214).
76+
Needless to say, it was a busy three years!
77+
78+
## 2023 Revision
79+
80+
Not wanting to rest on our laurels, immediately after tagging the 2022 release
81+
we got busy working on the [2023 revision](https://github.com/data-apis/array-api/blob/91ff864decaef09a7fcca28a4b65de3c5f765d5f/CHANGELOG.md#v202312)
82+
with a singular goal: eliminate any and all barriers to adoption. While achieving
83+
buy-in from array libraries across the ecosystem marked a significant achievement,
84+
what is critical for the long-term success of this collective effort is driving
85+
adoption among downstream libraries, such as SciPy, scikit-learn, and others,
86+
in order to achieve our stated goal of facilitating interoperability among
87+
array libraries.
88+
89+
To this end, we solicited feedback from downstream adopters regarding missing
90+
APIs, pain points, and general blind spots. During our discussions, we made
91+
three key observations. First, for a small subset of APIs, the behavior
92+
required by the standard did not match the reality on the ground, and we needed
93+
to revise the standard in order to ensure array libraries and their consumers
94+
could both achieve compliance **and** maintain backward compatibility. Second,
95+
we noticed a common set of operations which downstream adopters kept needing
96+
and for which they were implementing inefficient workarounds, thus making these
97+
operations excellent candidates for standardization. And lastly, we found that
98+
downstream adopters needed robust and portable mechanisms for inspecting
99+
library and device capabilities.
100+
101+
### Breaking Changes
102+
103+
To address our first observation, we made two breaking changes to the 2022
104+
revision of the standard. First, we revised the guidance for type promotion in
105+
`prod`, `sum`, and `linalg.trace` such that, by default, input arrays having
106+
floating-point data types are not upcasted to higher precision. The previous
107+
guidance reflected the concern that summation of large arrays having low
108+
precision could easily lead to overflow. While this concern is certainly valid
109+
for arrays having integer data types (e.g., `int8` and `int16`), this is less
110+
of a concern for floating-point data types which can typically handle a larger
111+
range of values and have a natural overflow value in infinity.
112+
113+
Second, we revised the guidance for portable input and output data types in FFT
114+
APIs. One of the specification's overriding design principles is requiring
115+
users to be explicit about their intent. In the 2022 revision, we failed to
116+
fully adhere to this principle in the FFT APIs, leading to ambiguity of
117+
acceptable return types and the potential for undesired automatic upcasting of
118+
real-valued arrays to complex-valued arrays. We thus sought to correct this
119+
deficiency and subsequently backported the changes to the 2022 revision.
120+
121+
### New Additions
122+
123+
To address our second observation, we identified and standardized several new
124+
APIs to ensure portable behavior among conforming array libraries.
125+
126+
- `clip`: clamps each element of an array to a specified range.
127+
- `copysign`: composes a floating-point value from a magnitude and sign.
128+
- `cumulative_sum`: calculates the cumulative sum.
129+
- `hypot`: computes the square root of the sum of squares.
130+
- `maximum`: computes the maximum value for each element of an array relative
131+
to the respective element in another array.
132+
- `minimum`: computes the minimum value for each element of an array relative
133+
to the respective element in another array.
134+
- `moveaxis`: moves array axes to new positions.
135+
- `repeat`: repeats each element of an array a specified number of times.
136+
- `searchsorted`: finds insertion positions such that sorted order would be
137+
preserved.
138+
- `signbit`: determines whether the sign bit is set for each element in an
139+
array.
140+
- `tile`: constructs an array by tiling another array.
141+
- `unstack`: splits an array into a sequence of arrays along a given axis.
142+
143+
### Inspection APIs
144+
145+
To address our third observation, we recognized that downstream library
146+
adopters needed more robust mechanisms for determining library and associated
147+
device capabilities. For libraries such as SciPy and scikit-learn who want to
148+
support array objects from multiple libraries, having a set of standardized
149+
top-level APIs is not sufficient. In order to devise concise mitigation
150+
strategies and gracefully handle varying hardware capabilities, having a means
151+
for reliably ascertaining device heterogeneity is critical. Accordingly, we
152+
worked to standardize inspection APIs to allow answering the following
153+
questions:
154+
155+
- does a library support boolean indexing and data-dependent output shapes?
156+
- how can one portably obtain a library's list of supported devices?
157+
- what is a library's default device?
158+
- what data types does a library support?
159+
- what are a library's default data types?
160+
- what data types does a specific device support?
161+
162+
After considerable discussion and coordination among array libraries and
163+
downstream stakeholders, we coalesced around an inspection API namespace
164+
165+
```python
166+
info = xp.__array_namespace_info__()
167+
```
168+
169+
with the following initial set of APIs:
170+
171+
- `capabilities`: returns a dictionary of array library capabilities.
172+
- `default_device`: returns the default device.
173+
- `default_types`: returns a dictionary containing default data types.
174+
- `dtypes`: returns a dictionary containing supported data types specific to
175+
a given device.
176+
- `devices`: returns a list of supported devices.
177+
178+
While these APIs may seem trivial on their surface, the reality is that array
179+
libraries have often lacked easy and portable programmatic access to data type
180+
and device information. We thus consider this outcome significant progress, and
181+
we're particularly eager to hear from downstream library authors what other
182+
capabilities they would find useful to query.
183+
184+
## Facilitating Array API Adoption
185+
186+
As mentioned above, 2023 was all about adoption, and adoption requires buy-in
187+
from both array libraries and the downstream consumers of those libraries.
188+
Adoption thus faces two key challenges. First, to facilitate development, array
189+
libraries need a robust mechanism for determining whether they are
190+
specification compliant. Second, while array libraries work to become fully
191+
specification compliant, downstream libraries need to be able to target a
192+
stable compatibility layer in order to smooth over subtle differences in array
193+
library behavior.
194+
195+
### Test Suite
196+
197+
To address the first challenge, we've continued to develop a comprehensive
198+
portable [test suite](https://github.com/data-apis/array-api-tests) built on
199+
Pytest and Hypothesis for testing Array API Standard compliance. In addition to
200+
the 2022 revision, the test suite has been updated to support the most recent
201+
2023 revision.
202+
203+
### Compatibility Layer
204+
205+
To address the second challenge, we've continued work on an [array
206+
compatibility layer](https://github.com/data-apis/array-api-compat) which
207+
provides a small wrapper around existing array libraries to ensure Array API
208+
Standard compliant behavior. We're proud to announce that, in addition to
209+
support for NumPy, CuPy, and PyTorch, we've added support for
210+
[Dask](https://github.com/data-apis/array-api-compat/pull/76) and
211+
[JAX](https://github.com/data-apis/array-api-compat/pull/84).
212+
213+
To get started, install from [PyPI](https://pypi.org/project/array-api-compat/)
214+
215+
```bash
216+
pip install array-api-compat
217+
```
218+
219+
and take it for a spin! If you encounter any issues, please be sure to let us
220+
know over on the library issue [tracker](https://github.com/data-apis/array-api-compat/issues).
221+
222+
## Adoption Milestones
223+
224+
Array libraries, such as NumPy, CuPy, PyTorch, JAX, and oneAPI, have continued
225+
work toward achieving full API compliance, which is a significant milestone in
226+
and of itself. But it's all for naught if array library consumers are not able
227+
to reap the benefits of standardization. Needless to say, we've seen
228+
significant uptake of the Array API Standard among downstream libraries. In
229+
particular, both [SciPy](https://docs.scipy.org/doc/scipy//dev/api-dev/array_api.html)
230+
and [sckit-learn](https://scikit-learn.org/stable/modules/array_api.html) have
231+
added experimental support, thus enabling support for both CPU and GPU tensors
232+
and marking a big win for end users. For the curious reader, we discussed some
233+
of the performance benefits in our recent [paper](https://proceedings.scipy.org/articles/018d8c34-e9ca-7105-9366-a050cc18b214)
234+
published in _SciPy Proceedings_ (2023).
235+
236+
### NumPy
237+
238+
One development that is especially noteworthy is the adoption of the Array API
239+
Standard in the main namespace of [NumPy 2.0](https://numpy.org/devdocs/release/2.0.0-notes.html).
240+
When we originally formed the Consortium and began the work of standardization,
241+
we didn't know exactly how array libraries would prefer to adopt an eventual
242+
array API standard. Would they adopt it in their main namespace? Or would they
243+
prefer to avoid potentially breaking backward compatibility and implement in a
244+
strictly compliant sub-namespace?
245+
246+
We wrote the specification with both possibilities in mind. NumPy and its kin
247+
went down the sub-namespace path, while libraries such as PyTorch opted for
248+
their main namespace. Well, after a few years of experimentation, the NumPy
249+
community decided that they liked the standard so much that relegating a
250+
strictly compliant implementation to a sub-namespace was not enough, and
251+
subsequently sought to apply the API design principles not just to standardized
252+
APIs in their main namespace, but across all of NumPy. This is a significant
253+
win for portability, and we're excited for the benefits NumPy 2.0 will bring to
254+
downstream libraries and the PyData ecosystem at large.
255+
256+
## The Road Ahead
257+
258+
Phew! That's a lot, and you've made it this far! So what's in store for 2024?!
259+
Glad you asked. Nothing too different from the year before. We're planning on
260+
staying the course, focusing on adoption, and continuing to address the gaps
261+
and pain points identified by downstream libraries.
262+
263+
In addition to normal specification work, we're particularly keen on developing
264+
more robust tools for specification compliance and monitoring. Based on
265+
feedback we've received from downstream libraries, there's still a lack of
266+
transparency around which APIs are supported and what are the potential edge
267+
cases. We have some ideas for how to increase visibility and will have more to
268+
share in the months to come.
269+
270+
Long story short, we're excited for the year ahead, and we'd love to get your
271+
feedback! To provide feedback on the Array API Standard, please open issues or
272+
pull requests on <https://github.com/data-apis/array-api>, and come participate
273+
in our public [discussions](https://github.com/data-apis/array-api/discussions).
274+
275+
Cheers!

0 commit comments

Comments
 (0)