-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy rectilinear interpolator, with sparse
#6006
base: main
Are you sure you want to change the base?
Conversation
they seem to better follow the array interface
for more information, see https://pre-commit.ci
Additional note on the use of |
…is into lazy-rectilinearinterpolator
Marking it as ready for review to hear your thoughts on this. What is in here for now seems to work (see e.g. code snippet below), provided that Test code snippetimport iris
from iris.analysis import RectilinearInterpolator
LATITUDE = [16., 16.1]
LONGITUDE = 226.
def get_cube():
filename = iris.sample_data_path('E1_north_america.nc')
return iris.load_cube(filename, 'air_temperature')
def interpolate(method):
assert method in ('linear', 'nearest')
cube = get_cube()
coords = ('latitude', 'longitude')
points = (LATITUDE, LONGITUDE)
interpolator = RectilinearInterpolator(cube, coords, method, "mask")
print('Cube is lazy: ', cube.has_lazy_data())
# Cube is lazy: True
result = interpolator(points, collapse_scalar=True)
print('Result is lazy: ', result.has_lazy_data())
# Result is lazy: True
if __name__ == '__main__':
interpolate('linear')
interpolate('nearest') |
Hi @fnattino, thanks for your hard work. Just to warn you I expect several weeks' delay before we can get to this as we are accumulating a backlog of ESMValTool changes while we prepare GeoVista for SciPy 2024 and complete the mesh-focused Iris 3.10 release. |
sparse
I have worked on a different approach that makes the intepolator lazy by using a similar approach as used in the regridder, see #6084. This has the disadvantage that it requires merging the chunks along the interpolating dimensions, but it is not as "disruptive" as in it does not add new dependencies, so maybe worth to have a look at #6084 first keeping this for the longer run? In the meantime I can also try to run some benchmarks to compare the performance of |
🚀 Pull Request
Description
Enable the rectilinear interpolator to run lazily #6002.
The original implementation makes use of a CSR matrix from
scipy.sparse
to represent the weights. However, this does not seem to work well with Dask arrays, since it does not adhere (completely) to the array interface. Switching to sparse arrays fromsparse
seems to work well, allowing the sparse matrix to be wrapped in a Dask array.Would this be an acceptable change? On the long run, it looks like
sparse
will replacescipy.sparse
in the PyData ecosystem..