1
1
.. _copyview-mutability :
2
2
3
- Copy-view behaviour and mutability
3
+ Copy-view behavior and mutability
4
4
==================================
5
5
6
6
.. admonition :: Mutating views
@@ -10,68 +10,90 @@ Copy-view behaviour and mutability
10
10
11
11
Strided array implementations (e.g. NumPy, PyTorch, CuPy, MXNet) typically
12
12
have the concept of a "view", meaning an array containing data in memory that
13
- belongs to another array (i.e. a different "view" on the original data).
14
- Views are useful for performance reasons - not copying data to a new location
15
- saves memory and is faster than copying - but can also affect the semantics
13
+ belongs to another array (i.e., a different "view" on the original data).
14
+ Views are useful for performance reasons— not copying data to a new location
15
+ saves memory and is faster than copying— but can also affect the semantics
16
16
of code. This happens when views are combined with *mutating * operations.
17
- This simple example illustrates that :
17
+ The following example is illustrative :
18
18
19
19
.. code-block :: python
20
20
21
21
x = ones(1 )
22
22
y = x[:] # `y` *may* be a view on the data of `x`
23
23
y -= 1 # if `y` is a view, this modifies `x`
24
24
25
- Code as simple as the above example will not be portable between array
26
- libraries - for NumPy/ PyTorch/ CuPy/MXNet ``x `` will contain the value ``0 ``,
27
- while for TensorFlow/ JAX/ Dask it will contain the value ``1 ``. The combination
28
- of views and mutability is fundamentally problematic here if the goal is to
29
- be able to write code with unambiguous semantics.
25
+ Code similar to the above example will not be portable between array
26
+ libraries. For example, for NumPy, PyTorch, and CuPy, ``x `` will contain the value ``0 ``,
27
+ while, for TensorFlow, JAX, and Dask, `` x `` will contain the value ``1 ``. In
28
+ this case, the combination of views and mutability is fundamentally problematic
29
+ if the goal is to be able to write code with unambiguous semantics.
30
30
31
31
Views are necessary for getting good performance out of the current strided
32
- array libraries. It is not always clear however when a library will return a
33
- view, and when it will return a copy. This API standard does not attempt to
34
- specify this - libraries can do either.
32
+ array libraries. It is not always clear, however, when a library will return a
33
+ view and when it will return a copy. This standard does not attempt to
34
+ specify this— libraries may do either.
35
35
36
- There are several types of operations that do in-place mutation of data
37
- contained in arrays . These include:
36
+ There are several types of operations that may perform in-place mutation of
37
+ array data . These include:
38
38
39
- 1. Inplace operators (e.g. ``*= ``)
39
+ 1. In-place operators (e.g. ``*= ``)
40
40
2. Item assignment (e.g. ``x[0] = 1 ``)
41
41
3. Slice assignment (e.g., ``x[:2, :] = 3 ``)
42
42
4. The `out= ` keyword present in some strided array libraries (e.g. ``sin(x, out=y) ``)
43
43
44
- Libraries like TensorFlow and JAX tend to support inplace operators, provide
44
+ Libraries such as TensorFlow and JAX tend to support in-place operators by providing
45
45
alternative syntax for item and slice assignment (e.g. an ``update_index ``
46
- function or ``x.at[idx].set(y) ``), and have no need for ``out= ``.
46
+ function or ``x.at[idx].set(y) ``) and have no need for ``out= ``.
47
47
48
- A potential solution could be to make views read-only, or use copy-on-write
49
- semantics. Both are hard to implement and would present significant issues
50
- for backwards compatibility for current strided array libraries. Read-only
51
- views would also not be a full solution, given that mutating the original
52
- (base) array will also result in ambiguous semantics. Hence this API standard
53
- does not attempt to go down this route .
48
+ A potential solution could be to make views read-only or implement copy-on-write
49
+ semantics. Both are hard to implement and would present significant backward
50
+ compatibility issues for current strided array libraries. Read-only
51
+ views would also not be a full solution due to the fact that mutating the original
52
+ (base) array will also result in ambiguous semantics. Accordingly, this standard
53
+ does not attempt to pursue this solution .
54
54
55
- Both inplace operators and item/slice assignment can be mapped onto
55
+ Both in-place operators and item/slice assignment can be mapped onto
56
56
equivalent functional expressions (e.g. ``x[idx] = val `` maps to
57
- ``x.at[idx].set(val) ``), and given that both inplace operators and item/slice
57
+ ``x.at[idx].set(val) ``), and, given that both in-place operators and item/slice
58
58
assignment are very widely used in both library and end user code, this
59
59
standard chooses to include them.
60
60
61
- The situation with ``out= `` is slightly different - it's less heavily used, and
62
- easier to avoid. It's also not an optimal API, because it mixes an
61
+ The situation with ``out= `` is slightly different— it's less heavily used, and
62
+ easier to avoid. It's also not an optimal API because it mixes an
63
63
"efficiency of implementation" consideration ("you're allowed to do this
64
- inplace") with the semantics of a function ("the output _must_ be placed into
65
- this array). There are libraries that do some form of tracing or abstract
66
- interpretation over a language that does not support mutation (to make
67
- analysis easier); in those cases implementing ``out= `` with correct handling of
68
- views may even be impossible to do. There's alternatives, for example the
69
- donated arguments in JAX or working buffers in LAPACK, that allow the user to
70
- express "you _may_ overwrite this data, do whatever is fastest". Given that
71
- those alternatives aren't widely used in array libraries today, this API
72
- standard chooses to (a) leave out ``out= ``, and (b) not specify another method
73
- of reusing arrays that are no longer needed as buffers.
74
-
75
- This leaves the problem of the initial example - with this API standard it
76
- remains possible to write code that will not work the same for all array
77
- libraries. This is something that the user must be careful about.
64
+ in-place") with the semantics of a function ("the output _must_ be placed into
65
+ this array"). There are libraries that do some form of tracing or abstract
66
+ interpretation over a vocabulary that does not support mutation (to make
67
+ analysis easier). In those cases implementing ``out= `` with correct handling of
68
+ views may even be impossible to do.
69
+
70
+ There are alternatives. For example, the concept of donated arguments in JAX or
71
+ working buffers in LAPACK which allow the user to express "you _may_ overwrite
72
+ this data; do whatever is fastest". Given that those alternatives aren't widely
73
+ used in array libraries today, this standard chooses to (a) leave out ``out= ``,
74
+ and (b) not specify another method of reusing arrays that are no longer needed
75
+ as buffers.
76
+
77
+ This leaves the problem of the initial example—despite the best efforts of this
78
+ standard, it remains possible to write code that will not work the same for all
79
+ array libraries. This is something that the users are advised to best keep in
80
+ mind and to reason carefully about the potential ambiguity of implemented code.
81
+
82
+ Copy keyword argument behavior
83
+ ------------------------------
84
+
85
+ Several APIs in this standard support a ``copy `` keyword argument (e.g.,
86
+ ``asarray ``, ``astype ``, ``reshape ``, and ``__dlpack__ ``). Typically, when a
87
+ user sets ``copy=True ``, the user does so in order to ensure that they are free
88
+ to mutate the returned array without side-effects—namely, without mutating other
89
+ views on the original (base) array. Accordingly, when ``copy=True ``, unless an
90
+ array library can guarantee that an array can be mutated without side-effects,
91
+ conforming libraries are recommended to always perform a physical copy of the
92
+ underlying array data.
93
+
94
+ .. note ::
95
+ Typically, in order to provide such a guarantee, libraries must perform
96
+ whole-program analysis.
97
+
98
+ Conversely, consumers of this standard should expect that, if they set
99
+ ``copy=True ``, they are free to use in-place operations on a returned array.
0 commit comments