Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data variables on roundtripped Xarray.DataTree #624

Open
maxrjones opened this issue Jan 24, 2025 · 2 comments
Open

Missing data variables on roundtripped Xarray.DataTree #624

maxrjones opened this issue Jan 24, 2025 · 2 comments

Comments

@maxrjones
Copy link

It appears data variables are lost when writing and subsequently reading an Xarray.DataTree with Icechunk. I'd be glad to look into this further to see if it relates to upstream issues (e.g., pydata/xarray#9960), but first wanted to check if there's a known solution.

MVCE

import zarr
import icechunk
import xarray as xr

set1_data = xr.Dataset({"a": 0, "b": 1})
set2_data = xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])})
root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})

root = xr.DataTree.from_dict(
    {
        "": root_data,
        "set1": set1_data,
        "set1/set1": None,
        "set1/set2": None,
        "set2": set2_data,
        "set2/set1": None,
        "set3": None,
    }
)
storage_config = icechunk.s3_storage(
    bucket="nasa-veda-scratch",
    prefix="icechunk-test/max/xr-datatree-roundtrip",
    region="us-west-2"
)
repo = icechunk.Repository.create(storage_config)
session = repo.writable_session("main")
root.to_zarr(session.store, zarr_format=3, consolidated=False)
session.commit("Commit datatree")
roundtripped = xr.open_datatree(session.store, engine="zarr")
xr.testing.assert_equal(root, roundtripped)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[4], line 11
      9 session.commit("Commit datatree")
     10 roundtripped = xr.open_datatree(session.store, engine="zarr")
---> 11 xr.testing.assert_equal(root, roundtripped)

    [... skipping hidden 1 frame]

File [/opt/conda/lib/python3.11/site-packages/xarray/testing/assertions.py:138](https://hub.openveda.cloud/opt/conda/lib/python3.11/site-packages/xarray/testing/assertions.py#line=137), in assert_equal(a, b, check_dim_order)
    136     assert a.equals(b), formatting.diff_coords_repr(a, b, "equals")
    137 elif isinstance(a, DataTree):
--> 138     assert a.equals(b), diff_datatree_repr(a, b, "equals")
    139 else:
    140     raise TypeError(f"{type(a)} not supported by assertion comparison")

AssertionError: Left and right DataTree objects are not equal

Data at node 'set1' does not match:
    Data variables only on the left object:
        a        int64 8B 0
        b        int64 8B 1

Data at node 'set2' does not match:
    Differing dimensions:
        (x: 2) != ()
    Data variables only on the left object:
        a        (x) int64 16B 2 3
        b        (x) float64 16B 0.1 0.2
@jhamman
Copy link
Member

jhamman commented Jan 24, 2025

Finalizing Zarr3 supoprt w/ Datatree is a known issue. Given the bug report in xarray, I'm surprised your example working as much as it is.

@maxrjones
Copy link
Author

Thanks for the context! Please feel welcome to reach out if there's any help wanted for finalizing Zarr V3 support w/ DataTree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants