Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cupy-xarray does not handle chunked dask arrays with NaNs #9195

Closed
yt87 opened this issue Jun 30, 2024 · 3 comments
Closed

cupy-xarray does not handle chunked dask arrays with NaNs #9195

yt87 opened this issue Jun 30, 2024 · 3 comments

Comments

@yt87
Copy link

yt87 commented Jun 30, 2024

What happened?

This bug was reported here xarray-contrib/cupy-xarray#52. Repeating here on a request from @dcherian.
I added a print line in duck_array_ops.py, function as_shared_type:

    # Avoid calling array_type("cupy") repeatidely in the any check
    array_type_cupy = array_type("cupy")
    if any(isinstance(x, array_type_cupy) for x in scalars_or_arrays):
        import cupy as cp

        xp = cp
    elif xp is None:
        xp = get_array_namespace(scalars_or_arrays)
    print('=======', xp.__name__, scalars_or_arrays)

What did you expect to happen?

a is evaluated correctly, xp in as_shared_dtype is set to cupy, as expected.
b is also correct, it is a dask array with cupy.ndarray chunks. There is some black magic involved, since xp is set to numpy.
c is dask array with chunks of type numpy.ndarray. This is wrong, subsequent calls to as_numpy or compute fail with the traceback as shown in the bug report xarray-contrib/cupy-xarray#52.

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr
import cupy_xarray

a = xr.DataArray([1, np.nan]).as_cupy().sum(min_count=1)
print('a =', a)
b = xr.DataArray([1, 2]).chunk(dim_0=2).as_cupy().sum(min_count=1)
print('b =', b)
c = xr.DataArray([1, np.nan]).chunk(dim_0=2).as_cupy().sum(min_count=1)
print('c =', c)


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

### Relevant log output

```Python
======= cupy [array([0., 0.]), array([ 1., nan])]
======= cupy [nan, array(1.)]
a = <xarray.DataArray ()> Size: 8B
array(1.)
b = <xarray.DataArray 'asarray-a8e500e046402975cc20971c9b97fd57' ()> Size: 8B
dask.array<sum-aggregate, shape=(), dtype=int64, chunksize=(), chunktype=cupy.ndarray>
======= numpy [dask.array<zeros_like, shape=(2,), dtype=float64, chunksize=(2,), chunktype=cupy.ndarray>, dask.array<asarray, shape=(2,), dtype=float64, chunksize=(2,), chunktype=cupy.ndarray>]
======= numpy [nan, dask.array<sum-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=cupy.ndarray>]
c = <xarray.DataArray 'asarray-a913867b040846d481995bbceaf8efdf' ()> Size: 8B
dask.array<where, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.9.3-3-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.6.0
pandas: 2.2.2
numpy: 2.0.0
scipy: None
netCDF4: 1.7.1
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.6.2
distributed: 2024.6.2
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.6.0
cupy: 13.2.0
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 70.1.1
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.25.0
sphinx: None

@yt87 yt87 added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 30, 2024
@keewis
Copy link
Collaborator

keewis commented Jun 30, 2024

I think this is is a duplicate of #7721. Can you confirm?

@keewis keewis removed the needs triage Issue that has not been reviewed by xarray team member label Jun 30, 2024
@yt87
Copy link
Author

yt87 commented Jun 30, 2024

Yes, it is. I also think that without a solution, cupy-xarray is not a viable idea.

@keewis
Copy link
Collaborator

keewis commented Jul 1, 2024

then let's close this and continue on #7721

@keewis keewis closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants