Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask_awkward is imported unexpectedly when pickling an awkward array #570

Open
chuyuanliu opened this issue Feb 11, 2025 · 4 comments
Open

Comments

@chuyuanliu
Copy link

I noticed dask and dask_awkward will be imported when pickling an awkward array even though dask_awkward is only installed but not used in the current environment . e.g.

import pickle
import sys
import awkward as ak

pickle.dumps(ak.Array([1, 2, 3]))
print("dask_awkward" in sys.modules, "dask" in sys.modules)

will give True, True.

This is likely invoked by this awkward pickle plugin. Given the implementation doesn't rely on dask_awkward, will it be better to put this in a separate folder like the dask sizeof plugin?

@martindurant
Copy link
Collaborator

Probably the pickle implementation can live upstream in awkward, if it needs no dask-awkward specific operations. That would enable removing dask-awkward from the entrypoint - but you would need to coordinate with the awkward repo.

Do you know how exactly it gets imported?

Having this repo produce three packages rather than the current two is, of course, possible, but annoying! Other options for deferring imports in the main package init maybe would be worse.

@chuyuanliu
Copy link
Author

By a naive comparison to the awkward's default __reduce_ex__, this plugin is dealing with the PlaceholderArray and using to_layout instead of to_packed to get layout, so maybe it is ok to put this in the upstream, but need further the confirmation from the authors.

The dask_awkward is imported when importing the submodule dask_awkward.pickle as configed, so an alternative solution could be to organize the src in the following structure:

src/
├── dask_awkward/
└── dask_awkward_plugins/ # or some other name
    ├── awkward_pickle.py
    └── dask_sizeof.py

In this way, there will still be two packages and no need to touch the __init__.py.

@martindurant
Copy link
Collaborator

when importing the submodule dask_awkward.pickle as configured

I was wondering more what the call chain was that led to importing - maybe prospective importing here should be user-optional, or depend on whether dask_awkward was already in sys.modules. Of course, by the time any of this is seen from any execution in this repo, dask-awkward is always already in memory.

@chuyuanliu
Copy link
Author

To import on demand is possible but may require some changes in the upstream, e.g. before this line check if entry_point.name in sys.modules. For now, it will import whatever is registered to awkward.pickle.reduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants