Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enforce dimension ordering? #82

Open
kthyng opened this issue Sep 16, 2020 · 18 comments
Open

enforce dimension ordering? #82

kthyng opened this issue Sep 16, 2020 · 18 comments

Comments

@kthyng
Copy link
Contributor

kthyng commented Sep 16, 2020

@dcherian I haven't gotten to integrate cf-xarray into my workflow yet, but, looking ahead, could cf-xarray be a tool for enforcing certain dimension ordering for xarray DataArrays? I find that after calculations sometimes the ordering changes and I haven't figured out a good way to automate dimensional order [time x vertical x y-coord x x-coord]. Sometimes it matters, too. Any thoughts?

@dcherian
Copy link
Contributor

dcherian commented Sep 17, 2020

the way to do this in xarray is .transpose(); you can do .cf.transpose("T", "Z", "Y", "X") for example. Let me know if it doesn't work for some cases.

@kthyng
Copy link
Contributor Author

kthyng commented Sep 17, 2020

Oh right, sorry I should have been specific. Yes, I know transposing works, but you have the know the dimensions ahead of time to call transpose properly. I would hope for something that transposes but given whatever number of the 4 possible dimensions (T, Z, Y, X), puts them in the proper order.

@dcherian
Copy link
Contributor

the challenge is that you may pass a Dataset that has an X coordinate with axis: X missing (maybe attrs got cleared somehow). Right now that would raise an error, which is good IMO. If cf_Xarray ignored that error then things may silently fail later.

The current solution is

ds.cf.transpose(*[dim for dim in ["T", "Z", "Y", "X"] if dim in ds.cf.get_valid_keys()])

What are you trying to do by enforcing dimension order? Maybe there's something else cf_xarray can do elsewhere

@kthyng
Copy link
Contributor Author

kthyng commented Sep 18, 2020

That would work if the cf xarray attributes work for a Dataarray for ROMS output. I could have that check as needed in code. I ran across this recently when I did some calculation that changed the order of dimensions in a DataArray and then I used xESMF which assumes typical ordering. So either I would like for my arrays to always be put into proper ordering with a quick command like you listed after anything I run (thinking about xroms here), or all commands to DataArrays should be able to be called by a keyword ordering instead of assuming typical array ordering (in this example that would be a change to xESMF).

@dcherian
Copy link
Contributor

That would work if the cf xarray attributes work for a Dataarray for ROMS output.

if you want X, Y you may have to add some attrs when you create a dataset in xroms; or change ROMS...

(in this example that would be a change to xESMF).

This would be desirable. We would like packages that build on xarray to not depend on dimension order. Can you open an issue at https://github.com/pangeo-data/xESMF/issues

@kthyng
Copy link
Contributor Author

kthyng commented Sep 21, 2020

I'm working on integrating adding the attributes into xroms. I have another question. I had been thinking that cf-xarray Axes were analogous to dimensions in xarray, and Coordinates analogous to xarray coords. Is this correct? It looks to me like it might not be when I try the code from above:

ds.cf.transpose(*[dim for dim in ["T", "Z", "Y", "X"] if dim in ds.cf.get_valid_keys()])

I have assigned attributes to coordinates in my ROMS output, but then the Axes are pointing to lon_rho, lat_rho, etc. I think they should point to eta_rho, xi_rho, etc, if transpose is going to work with them. However, I don't seem to be able to assign any attributes to xarray dimensions. Any ideas on this?

(in this example that would be a change to xESMF).

This would be desirable. We would like packages that build on xarray to not depend on dimension order. Can you open an issue at https://github.com/pangeo-data/xESMF/issues

Ok added to my list.

@dcherian
Copy link
Contributor

You'll have to create an actual array for xi_rho. There are no values ssociated with xi_rho in the file; so xarray uses np.arange on the fly. See #84 (comment)

I had been thinking that cf-xarray Axes were analogous to dimensions in xarray, and Coordinates analogous to xarray coords.

cf_xarray just has special names X, Y, Z, T, latitude, longitude, vertical, time which are in the CF conventions (I think the X, Y , Z, T comes from COARDS or something). For each name, cf_xarray will look for certain attributes and if found will replace the name with an appropriate variable name.

I agree with you: for ROMS I would set xi_* as X and eta_* as Y (after assigning arrays); lat_*; lon_* should be latitude, longitude. This should be most useful for curvilinear grids. E.g. .cf.sum("X") will work but .cf.sum("longitude") won't because sum expects a dimension name. .cf.plot(x="longitude", y="latitude") will do the right thing
(in this case, your Axes vs Coordinates distinction works)

For regular grids: xi_* and lon_* should be exactly the same with appropriate attributes for longitude and X
(in this case, your Axes vs Coordinates distinction doesn't work)

@kthyng
Copy link
Contributor Author

kthyng commented Sep 21, 2020

As I read this I realized you had addressed this before (#84) but it didn't stick in my head bc I hadn't run across it and thought about it yet. Thank you for your patience. I will try this.

@kthyng
Copy link
Contributor Author

kthyng commented Sep 21, 2020

I am able to get the dataset to recognize eta_rho, xi_rho, etc, as Axes now, but not individual variables like temp. The other parts are filling in nicely, but the Axes are the sweet spot I think. What connects the Axes to a particular variable that could be breaking here?

image

@dcherian
Copy link
Contributor

What connects the Axes to a particular variable that could be breaking here?

Just the dimension name: so ds.temp.cf["X"] will give you xi_rho back.

For temp and friends, you need to set the standard_name attribute: sea_water_potential_temperature in this case. http://cfconventions.org/Data/cf-standard-names/75/build/cf-standard-name-table.html

@dcherian
Copy link
Contributor

Sorry just saw the second cell; hmmm.... that looks wrong.

@dcherian
Copy link
Contributor

Thanks @kthyng. I found the bug...

also while trying to reproduce, I found a clear way of converting xi_rho and friends to proper dataarrays


# set dimensions as X, Y
pop["nlon"] = ("nlon", np.arange(pop.sizes["nlon"]), {"axis": "X"})
pop["nlat"] = ("nlat", np.arange(pop.sizes["nlat"]), {"axis": "Y"})

dcherian added a commit that referenced this issue Sep 22, 2020
…85)

* Search through dimension names when coordinates attribute is present

See #82

* Delete user-guide.rst
@kthyng
Copy link
Contributor Author

kthyng commented Sep 22, 2020

As I noted elsewhere, this worked for me, thanks!

image

also while trying to reproduce, I found a clear way of converting xi_rho and friends to proper dataarrays


# set dimensions as X, Y
pop["nlon"] = ("nlon", np.arange(pop.sizes["nlon"]), {"axis": "X"})
pop["nlat"] = ("nlat", np.arange(pop.sizes["nlat"]), {"axis": "Y"})

I tried out these lines and they are indeed a shorter way to convert a dimension to a coordinate and then set the attribute, so I switched over to that, thanks.

So with this change, I can close this issue! I am able to use

var.cf.transpose(*[dim for dim in ["T", "Z", "Y", "X"] if dim in var.cf.get_valid_keys()])

to transpose an array from ROMS without needing to know if it actually has all the dimensions a priori. I know it is better to not require arrays to be in the correct order (will try to follow up with xESMF with about that), but it will also be helpful to have this available. Thanks again.

@kthyng kthyng closed this as completed Sep 22, 2020
@kthyng kthyng reopened this Sep 22, 2020
@kthyng
Copy link
Contributor Author

kthyng commented Sep 22, 2020

Sorry to reopen this, but wouldn't this be a nice convenience function for cf-xarray to have? I was just thinking about wrapping it in xroms, but why not have this package have it? Something like:

da.cf.enforce_ordering()

which returns da but in conventional order of ["T", "Z", "Y", "X"].

@kthyng

This comment has been minimized.

@dcherian
Copy link
Contributor

Maybe something like .cf.force_dim_order(order=("T", "Z", "Y", "X"), error="ignore") # this is the default kwarg value?

@kthyng
Copy link
Contributor Author

kthyng commented Sep 22, 2020

That would work fine for me. But, why not essentially force it with this convenience function to a specific ordering? One can always transpose to other dimension ordering if they don't want this one for some reason, but doesn't CF convention dictate this order?

@dcherian
Copy link
Contributor

dcherian commented Nov 12, 2024

COARDS standardizes the description of grids composed of independent latitude, longitude, vertical, and time axes. In addition to standardizing the metadata required to identify each of these axis types COARDS restricts the axis (equivalently dimension) ordering to be longitude, latitude, vertical, and time (with longitude being the most rapidly varying dimension).

If any or all of the dimensions of a variable have the interpretations of "date or time" (T), "height or depth" (Z), "latitude" (Y), or "longitude" (X) then we recommend, but do not require (see Section 1.4, "Relationship to the COARDS Conventions" ), those dimensions to appear in the relative order T, then Z, then Y, then X in the CDL definition corresponding to the file. All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions.

Just noticed this today, so perhaps we could have .cf.transpose_coards() or .cf.transpose(cfxr.COARDS_ORDER) that requires X, Y, Z, T be associated with dimension names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants