Skip to content

Commit 4863912

Browse files
committed
docs: add design topic on copy keyword argument behavior
Closes: data-apis#886 Closes: data-apis#866
1 parent 6b8172e commit 4863912

File tree

1 file changed

+64
-42
lines changed

1 file changed

+64
-42
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _copyview-mutability:
22

3-
Copy-view behaviour and mutability
3+
Copy-view behavior and mutability
44
==================================
55

66
.. admonition:: Mutating views
@@ -10,68 +10,90 @@ Copy-view behaviour and mutability
1010

1111
Strided array implementations (e.g. NumPy, PyTorch, CuPy, MXNet) typically
1212
have the concept of a "view", meaning an array containing data in memory that
13-
belongs to another array (i.e. a different "view" on the original data).
14-
Views are useful for performance reasons - not copying data to a new location
15-
saves memory and is faster than copying - but can also affect the semantics
13+
belongs to another array (i.e., a different "view" on the original data).
14+
Views are useful for performance reasonsnot copying data to a new location
15+
saves memory and is faster than copyingbut can also affect the semantics
1616
of code. This happens when views are combined with *mutating* operations.
17-
This simple example illustrates that:
17+
The following example is illustrative:
1818

1919
.. code-block:: python
2020
2121
x = ones(1)
2222
y = x[:] # `y` *may* be a view on the data of `x`
2323
y -= 1 # if `y` is a view, this modifies `x`
2424
25-
Code as simple as the above example will not be portable between array
26-
libraries - for NumPy/PyTorch/CuPy/MXNet ``x`` will contain the value ``0``,
27-
while for TensorFlow/JAX/Dask it will contain the value ``1``. The combination
28-
of views and mutability is fundamentally problematic here if the goal is to
29-
be able to write code with unambiguous semantics.
25+
Code similar to the above example will not be portable between array
26+
libraries. For example, for NumPy, PyTorch, and CuPy, ``x`` will contain the value ``0``,
27+
while, for TensorFlow, JAX, and Dask, ``x`` will contain the value ``1``. In
28+
this case, the combination of views and mutability is fundamentally problematic
29+
if the goal is to be able to write code with unambiguous semantics.
3030

3131
Views are necessary for getting good performance out of the current strided
32-
array libraries. It is not always clear however when a library will return a
33-
view, and when it will return a copy. This API standard does not attempt to
34-
specify this - libraries can do either.
32+
array libraries. It is not always clear, however, when a library will return a
33+
view and when it will return a copy. This standard does not attempt to
34+
specify thislibraries may do either.
3535

36-
There are several types of operations that do in-place mutation of data
37-
contained in arrays. These include:
36+
There are several types of operations that may perform in-place mutation of
37+
array data. These include:
3838

39-
1. Inplace operators (e.g. ``*=``)
39+
1. In-place operators (e.g. ``*=``)
4040
2. Item assignment (e.g. ``x[0] = 1``)
4141
3. Slice assignment (e.g., ``x[:2, :] = 3``)
4242
4. The `out=` keyword present in some strided array libraries (e.g. ``sin(x, out=y)``)
4343

44-
Libraries like TensorFlow and JAX tend to support inplace operators, provide
44+
Libraries such as TensorFlow and JAX tend to support in-place operators by providing
4545
alternative syntax for item and slice assignment (e.g. an ``update_index``
46-
function or ``x.at[idx].set(y)``), and have no need for ``out=``.
46+
function or ``x.at[idx].set(y)``) and have no need for ``out=``.
4747

48-
A potential solution could be to make views read-only, or use copy-on-write
49-
semantics. Both are hard to implement and would present significant issues
50-
for backwards compatibility for current strided array libraries. Read-only
51-
views would also not be a full solution, given that mutating the original
52-
(base) array will also result in ambiguous semantics. Hence this API standard
53-
does not attempt to go down this route.
48+
A potential solution could be to make views read-only or implement copy-on-write
49+
semantics. Both are hard to implement and would present significant backward
50+
compatibility issues for current strided array libraries. Read-only
51+
views would also not be a full solution due to the fact that mutating the original
52+
(base) array will also result in ambiguous semantics. Accordingly, this standard
53+
does not attempt to pursue this solution.
5454

55-
Both inplace operators and item/slice assignment can be mapped onto
55+
Both in-place operators and item/slice assignment can be mapped onto
5656
equivalent functional expressions (e.g. ``x[idx] = val`` maps to
57-
``x.at[idx].set(val)``), and given that both inplace operators and item/slice
57+
``x.at[idx].set(val)``), and, given that both in-place operators and item/slice
5858
assignment are very widely used in both library and end user code, this
5959
standard chooses to include them.
6060

61-
The situation with ``out=`` is slightly different - it's less heavily used, and
62-
easier to avoid. It's also not an optimal API, because it mixes an
61+
The situation with ``out=`` is slightly differentit's less heavily used, and
62+
easier to avoid. It's also not an optimal API because it mixes an
6363
"efficiency of implementation" consideration ("you're allowed to do this
64-
inplace") with the semantics of a function ("the output _must_ be placed into
65-
this array). There are libraries that do some form of tracing or abstract
66-
interpretation over a language that does not support mutation (to make
67-
analysis easier); in those cases implementing ``out=`` with correct handling of
68-
views may even be impossible to do. There's alternatives, for example the
69-
donated arguments in JAX or working buffers in LAPACK, that allow the user to
70-
express "you _may_ overwrite this data, do whatever is fastest". Given that
71-
those alternatives aren't widely used in array libraries today, this API
72-
standard chooses to (a) leave out ``out=``, and (b) not specify another method
73-
of reusing arrays that are no longer needed as buffers.
74-
75-
This leaves the problem of the initial example - with this API standard it
76-
remains possible to write code that will not work the same for all array
77-
libraries. This is something that the user must be careful about.
64+
in-place") with the semantics of a function ("the output _must_ be placed into
65+
this array"). There are libraries that do some form of tracing or abstract
66+
interpretation over a vocabulary that does not support mutation (to make
67+
analysis easier). In those cases implementing ``out=`` with correct handling of
68+
views may even be impossible to do.
69+
70+
There are alternatives. For example, the concept of donated arguments in JAX or
71+
working buffers in LAPACK which allow the user to express "you _may_ overwrite
72+
this data; do whatever is fastest". Given that those alternatives aren't widely
73+
used in array libraries today, this standard chooses to (a) leave out ``out=``,
74+
and (b) not specify another method of reusing arrays that are no longer needed
75+
as buffers.
76+
77+
This leaves the problem of the initial example—despite the best efforts of this
78+
standard, it remains possible to write code that will not work the same for all
79+
array libraries. This is something that the users are advised to best keep in
80+
mind and to reason carefully about the potential ambiguity of implemented code.
81+
82+
Copy keyword argument behavior
83+
------------------------------
84+
85+
Several APIs in this standard support a ``copy`` keyword argument (e.g.,
86+
``asarray``, ``astype``, ``reshape``, and ``__dlpack__``). Typically, when a
87+
user sets ``copy=True``, the user does so in order to ensure that they are free
88+
to mutate the returned array without side-effects—namely, without mutating other
89+
views on the original (base) array. Accordingly, when ``copy=True``, unless an
90+
array library can guarantee that an array can be mutated without side-effects,
91+
conforming libraries are recommended to always perform a physical copy of the
92+
underlying array data.
93+
94+
.. note::
95+
Typically, in order to provide such a guarantee, libraries must perform
96+
whole-program analysis.
97+
98+
Conversely, consumers of this standard should expect that, if they set
99+
``copy=True``, they are free to use in-place operations on a returned array.

0 commit comments

Comments
 (0)