|
| 1 | ++++ |
| 2 | +date = "2024-04-08T08:00:00+00:00" |
| 3 | +author = "Athan Reines" |
| 4 | +title = "2023 release of the Array API Standard" |
| 5 | +tags = ["APIs", "standard", "consortium", "arrays", "community"] |
| 6 | +categories = ["Consortium", "Standardization"] |
| 7 | +description = "The 2023 revision of the array API standard has been finalized and is ready for adoption by conforming array libraries." |
| 8 | +draft = false |
| 9 | +weight = 30 |
| 10 | ++++ |
| 11 | + |
| 12 | +Another year, another revision of the Array API Standard! We're proud to |
| 13 | +announce the release of the 2023 revision of the Array API Standard. As was the |
| 14 | +case for [2022 revision](https://data-apis.org/blog/array_api_v2022_release/), |
| 15 | +this release required extensive discussion and collaboration among array |
| 16 | +libraries and their downstream stakeholders as we continued reaching consensus |
| 17 | +on unified API design and behavior. We're particularly excited to share that |
| 18 | +this year marked a significant milestone in our efforts to facilitate array |
| 19 | +interoperation within the PyData ecosystem, as we witnessed accelerated |
| 20 | +adoption of the standard, especially among downstream libraries, such as |
| 21 | +[SciPy[(https://docs.scipy.org/doc/scipy//dev/api-dev/array_api.html)] and |
| 22 | +[scikit-learn](https://scikit-learn.org/stable/modules/array_api.html). |
| 23 | + |
| 24 | +## Brief Background |
| 25 | + |
| 26 | +For those who are not yet familiar with the Consortium and the Array API |
| 27 | +Standard, a bit of background. Our aim is to standardize the fundamental |
| 28 | +building blocks of scientific computation: multi-dimensional arrays (a.k.a. |
| 29 | +tensors). The PyData ecosystem has a rich set of libraries for working with |
| 30 | +arrays, including NumPy, CuPy, Dask, PyTorch, JAX, TensorFlow, oneAPI, and |
| 31 | +beyond. Historically, interoperation among array libraries has been challenging |
| 32 | +due to divergent API designs and subtle variation in behavior such that code |
| 33 | +written for one array library cannot be readily ported to another array |
| 34 | +library. To address these challenges, the [Consortium for Python Data API |
| 35 | +Standards](https://data-apis.org/blog/announcing_the_consortium/) was |
| 36 | +established to facilitate coordination among array and dataframe library |
| 37 | +maintainers, sponsoring organizations, and key stakeholders and to provide a |
| 38 | +transparent and inclusive process--with input from the broader Python |
| 39 | +community--for standardizing array API design. |
| 40 | + |
| 41 | +Soon after formation of the Consortium in May 2020, we released an [initial |
| 42 | +draft](https://data-apis.org/blog/array_api_standard_release/) of the array API |
| 43 | +specification and sought input from the broader PyData ecosystem during an |
| 44 | +extended community review period. Throughout 2021, we engaged in a tight |
| 45 | +feedback loop with array API adopters to refine and improve the initial draft |
| 46 | +specification. |
| 47 | + |
| 48 | +During this time, we reached three key milestones. First, we introduced a data |
| 49 | +interchange protocol based on [DLPack](https://github.com/dmlc/dlpack) to |
| 50 | +facilitate zero-copy memory exchange between array libraries. Second, we |
| 51 | +standardized a core set of API designs for array creation, mutation, and |
| 52 | +element-wise computation. Third, we introduced "extensions", which are defined |
| 53 | +as coherent sets of functionality that are commonly implemented across array |
| 54 | +libraries, but which conforming array libraries may choose not to implement. |
| 55 | +The first extension we included in the specification was the `linalg` |
| 56 | +extension, which defines a set of linear algebra APIs for computing |
| 57 | +eigenvalues, performing singular value decomposition, solving a system of |
| 58 | +linear equations, and other linear algebra operations. |
| 59 | + |
| 60 | +Building on the success of the 2021 revision of the Array API Standard, we |
| 61 | +worked throughout 2022 on a subsequent specification revision with two key |
| 62 | +objectives: standardize complex number support and standardize an extension for |
| 63 | +Fast Fourier Transforms (FFTs). These efforts culminated in the [2022 |
| 64 | +revision](https://data-apis.org/blog/array_api_v2022_release/) of the Array API |
| 65 | +Standard, along with significant advancements in tooling to support |
| 66 | +specification adoption. Importantly, we released 1) a comprehensive portable |
| 67 | +[test suite](https://github.com/data-apis/array-api-tests) built on Pytest and |
| 68 | +Hypothesis for testing Array API Standard compliance and 2) an [array |
| 69 | +compatibility layer](https://github.com/data-apis/array-api-compat) which |
| 70 | +provides a small wrapper around existing array libraries to ensure Array API |
| 71 | +Standard compliant behavior. |
| 72 | + |
| 73 | +With the 2022 revision out of the way, we summarized our work to date, |
| 74 | +publishing in _SciPy Proceedings_ the paper ["Python Array API Standard: Toward |
| 75 | +Array Interoperability in the Scientific Python Ecosystem"](https://proceedings.scipy.org/articles/018d8c34-e9ca-7105-9366-a050cc18b214). |
| 76 | +Needless to say, it was a busy three years! |
| 77 | + |
| 78 | +## 2023 Revision |
| 79 | + |
| 80 | +Not wanting to rest on our laurels, immediately after tagging the 2022 release |
| 81 | +we got busy working on the [2023 revision](https://github.com/data-apis/array-api/blob/91ff864decaef09a7fcca28a4b65de3c5f765d5f/CHANGELOG.md#v202312) |
| 82 | +with a singular goal: eliminate any and all barriers to adoption. While achieving |
| 83 | +buy-in from array libraries across the ecosystem marked a significant achievement, |
| 84 | +what is critical for the long-term success of this collective effort is driving |
| 85 | +adoption among downstream libraries, such as SciPy, scikit-learn, and others, |
| 86 | +in order to achieve our stated goal of facilitating interoperability among |
| 87 | +array libraries. |
| 88 | + |
| 89 | +To this end, we solicited feedback from downstream adopters regarding missing |
| 90 | +APIs, pain points, and general blind spots. During our discussions, we made |
| 91 | +three key observations. First, for a small subset of APIs, the behavior |
| 92 | +required by the standard did not match the reality on the ground, and we needed |
| 93 | +to revise the standard in order to ensure array libraries and their consumers |
| 94 | +could both achieve compliance **and** maintain backward compatibility. Second, |
| 95 | +we noticed a common set of operations which downstream adopters kept needing |
| 96 | +and for which they were implementing inefficient workarounds, thus making these |
| 97 | +operations excellent candidates for standardization. And lastly, we found that |
| 98 | +downstream adopters needed robust and portable mechanisms for inspecting |
| 99 | +library and device capabilities. |
| 100 | + |
| 101 | +### Breaking Changes |
| 102 | + |
| 103 | +To address our first observation, we made two breaking changes to the 2022 |
| 104 | +revision of the standard. First, we revised the guidance for type promotion in |
| 105 | +`prod`, `sum`, and `linalg.trace` such that, by default, input arrays having |
| 106 | +floating-point data types are not upcasted to higher precision. The previous |
| 107 | +guidance reflected the concern that summation of large arrays having low |
| 108 | +precision could easily lead to overflow. While this concern is certainly valid |
| 109 | +for arrays having integer data types (e.g., `int8` and `int16`), this is less |
| 110 | +of a concern for floating-point data types which can typically handle a larger |
| 111 | +range of values and have a natural overflow value in infinity. |
| 112 | + |
| 113 | +Second, we revised the guidance for portable input and output data types in FFT |
| 114 | +APIs. One of the specification's overriding design principles is requiring |
| 115 | +users to be explicit about their intent. In the 2022 revision, we failed to |
| 116 | +fully adhere to this principle in the FFT APIs, leading to ambiguity of |
| 117 | +acceptable return types and the potential for undesired automatic upcasting of |
| 118 | +real-valued arrays to complex-valued arrays. We thus sought to correct this |
| 119 | +deficiency and subsequently backported the changes to the 2022 revision. |
| 120 | + |
| 121 | +### New Additions |
| 122 | + |
| 123 | +To address our second observation, we identified and standardized several new |
| 124 | +APIs to ensure portable behavior among conforming array libraries. |
| 125 | + |
| 126 | +- `clip`: clamps each element of an array to a specified range. |
| 127 | +- `copysign`: composes a floating-point value from a magnitude and sign. |
| 128 | +- `cumulative_sum`: calculates the cumulative sum. |
| 129 | +- `hypot`: computes the square root of the sum of squares. |
| 130 | +- `maximum`: computes the maximum value for each element of an array relative |
| 131 | + to the respective element in another array. |
| 132 | +- `minimum`: computes the minimum value for each element of an array relative |
| 133 | + to the respective element in another array. |
| 134 | +- `moveaxis`: moves array axes to new positions. |
| 135 | +- `repeat`: repeats each element of an array a specified number of times. |
| 136 | +- `searchsorted`: finds insertion positions such that sorted order would be |
| 137 | + preserved. |
| 138 | +- `signbit`: determines whether the sign bit is set for each element in an |
| 139 | + array. |
| 140 | +- `tile`: constructs an array by tiling another array. |
| 141 | +- `unstack`: splits an array into a sequence of arrays along a given axis. |
| 142 | + |
| 143 | +### Inspection APIs |
| 144 | + |
| 145 | +To address our third observation, we recognized that downstream library |
| 146 | +adopters needed more robust mechanisms for determining library and associated |
| 147 | +device capabilities. For libraries such as SciPy and scikit-learn who want to |
| 148 | +support array objects from multiple libraries, having a set of standardized |
| 149 | +top-level APIs is not sufficient. In order to devise concise mitigation |
| 150 | +strategies and gracefully handle varying hardware capabilities, having a means |
| 151 | +for reliably ascertaining device heterogeneity is critical. Accordingly, we |
| 152 | +worked to standardize inspection APIs to allow answering the following |
| 153 | +questions: |
| 154 | + |
| 155 | +- does a library support boolean indexing and data-dependent output shapes? |
| 156 | +- how can one portably obtain a library's list of supported devices? |
| 157 | +- what is a library's default device? |
| 158 | +- what data types does a library support? |
| 159 | +- what are a library's default data types? |
| 160 | +- what data types does a specific device support? |
| 161 | + |
| 162 | +After considerable discussion and coordination among array libraries and |
| 163 | +downstream stakeholders, we coalesced around an inspection API namespace |
| 164 | + |
| 165 | +```python |
| 166 | +info = xp.__array_namespace_info__() |
| 167 | +``` |
| 168 | + |
| 169 | +with the following initial set of APIs: |
| 170 | + |
| 171 | +- `capabilities`: returns a dictionary of array library capabilities. |
| 172 | +- `default_device`: returns the default device. |
| 173 | +- `default_types`: returns a dictionary containing default data types. |
| 174 | +- `dtypes`: returns a dictionary containing supported data types specific to |
| 175 | + a given device. |
| 176 | +- `devices`: returns a list of supported devices. |
| 177 | + |
| 178 | +While these APIs may seem trivial on their surface, the reality is that array |
| 179 | +libraries have often lacked easy and portable programmatic access to data type |
| 180 | +and device information. We thus consider this outcome significant progress, and |
| 181 | +we're particularly eager to hear from downstream library authors what other |
| 182 | +capabilities they would find useful to query. |
| 183 | + |
| 184 | +## Facilitating Array API Adoption |
| 185 | + |
| 186 | +As mentioned above, 2023 was all about adoption, and adoption requires buy-in |
| 187 | +from both array libraries and the downstream consumers of those libraries. |
| 188 | +Adoption thus faces two key challenges. First, to facilitate development, array |
| 189 | +libraries need a robust mechanism for determining whether they are |
| 190 | +specification compliant. Second, while array libraries work to become fully |
| 191 | +specification compliant, downstream libraries need to be able to target a |
| 192 | +stable compatibility layer in order to smooth over subtle differences in array |
| 193 | +library behavior. |
| 194 | + |
| 195 | +### Test Suite |
| 196 | + |
| 197 | +To address the first challenge, we've continued to develop a comprehensive |
| 198 | +portable [test suite](https://github.com/data-apis/array-api-tests) built on |
| 199 | +Pytest and Hypothesis for testing Array API Standard compliance. In addition to |
| 200 | +the 2022 revision, the test suite has been updated to support the most recent |
| 201 | +2023 revision. |
| 202 | + |
| 203 | +### Compatibility Layer |
| 204 | + |
| 205 | +To address the second challenge, we've continued work on an [array |
| 206 | +compatibility layer](https://github.com/data-apis/array-api-compat) which |
| 207 | +provides a small wrapper around existing array libraries to ensure Array API |
| 208 | +Standard compliant behavior. We're proud to announce that, in addition to |
| 209 | +support for NumPy, CuPy, and PyTorch, we've added support for |
| 210 | +[Dask](https://github.com/data-apis/array-api-compat/pull/76) and |
| 211 | +[JAX](https://github.com/data-apis/array-api-compat/pull/84). |
| 212 | + |
| 213 | +To get started, install from [PyPI](https://pypi.org/project/array-api-compat/) |
| 214 | + |
| 215 | +```bash |
| 216 | +pip install array-api-compat |
| 217 | +``` |
| 218 | + |
| 219 | +and take it for a spin! If you encounter any issues, please be sure to let us |
| 220 | +know over on the library issue [tracker](https://github.com/data-apis/array-api-compat/issues). |
| 221 | + |
| 222 | +## Adoption Milestones |
| 223 | + |
| 224 | +Array libraries, such as NumPy, CuPy, PyTorch, JAX, and oneAPI, have continued |
| 225 | +work toward achieving full API compliance, which is a significant milestone in |
| 226 | +and of itself. But it's all for naught if array library consumers are not able |
| 227 | +to reap the benefits of standardization. Needless to say, we've seen |
| 228 | +significant uptake of the Array API Standard among downstream libraries. In |
| 229 | +particular, both [SciPy](https://docs.scipy.org/doc/scipy//dev/api-dev/array_api.html) |
| 230 | +and [sckit-learn](https://scikit-learn.org/stable/modules/array_api.html) have |
| 231 | +added experimental support, thus enabling support for both CPU and GPU tensors |
| 232 | +and marking a big win for end users. For the curious reader, we discussed some |
| 233 | +of the performance benefits in our recent [paper](https://proceedings.scipy.org/articles/018d8c34-e9ca-7105-9366-a050cc18b214) |
| 234 | +published in _SciPy Proceedings_ (2023). |
| 235 | + |
| 236 | +### NumPy |
| 237 | + |
| 238 | +One development that is especially noteworthy is the adoption of the Array API |
| 239 | +Standard in the main namespace of [NumPy 2.0](https://numpy.org/devdocs/release/2.0.0-notes.html). |
| 240 | +When we originally formed the Consortium and began the work of standardization, |
| 241 | +we didn't know exactly how array libraries would prefer to adopt an eventual |
| 242 | +array API standard. Would they adopt it in their main namespace? Or would they |
| 243 | +prefer to avoid potentially breaking backward compatibility and implement in a |
| 244 | +strictly compliant sub-namespace? |
| 245 | + |
| 246 | +We wrote the specification with both possibilities in mind. NumPy and its kin |
| 247 | +went down the sub-namespace path, while libraries such as PyTorch opted for |
| 248 | +their main namespace. Well, after a few years of experimentation, the NumPy |
| 249 | +community decided that they liked the standard so much that relegating a |
| 250 | +strictly compliant implementation to a sub-namespace was not enough, and |
| 251 | +subsequently sought to apply the API design principles not just to standardized |
| 252 | +APIs in their main namespace, but across all of NumPy. This is a significant |
| 253 | +win for portability, and we're excited for the benefits NumPy 2.0 will bring to |
| 254 | +downstream libraries and the PyData ecosystem at large. |
| 255 | + |
| 256 | +## The Road Ahead |
| 257 | + |
| 258 | +Phew! That's a lot, and you've made it this far! So what's in store for 2024?! |
| 259 | +Glad you asked. Nothing too different from the year before. We're planning on |
| 260 | +staying the course, focusing on adoption, and continuing to address the gaps |
| 261 | +and pain points identified by downstream libraries. |
| 262 | + |
| 263 | +In addition to normal specification work, we're particularly keen on developing |
| 264 | +more robust tools for specification compliance and monitoring. Based on |
| 265 | +feedback we've received from downstream libraries, there's still a lack of |
| 266 | +transparency around which APIs are supported and what are the potential edge |
| 267 | +cases. We have some ideas for how to increase visibility and will have more to |
| 268 | +share in the months to come. |
| 269 | + |
| 270 | +Long story short, we're excited for the year ahead, and we'd love to get your |
| 271 | +feedback! To provide feedback on the Array API Standard, please open issues or |
| 272 | +pull requests on <https://github.com/data-apis/array-api>, and come participate |
| 273 | +in our public [discussions](https://github.com/data-apis/array-api/discussions). |
| 274 | + |
| 275 | +Cheers! |
0 commit comments