-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment of image processing, ideally somewhere maintained by others! #47
Comments
https://github.com/NERC-CEH/plankton_ml/blob/main/PIPELINES.md - this has the walkthrough of setting up the Luigi-based workflow (from NAS to object store). |
https://luigi.readthedocs.io/en/stable/central_scheduler.html - docs for the central scheduler. it's more of a task manager and UI, reminds me of Celery Flower, the expectation with Luigi is you use While testing this my connection to the VM died, |
I set up a (should be?) a small change and I might try to contribute it right now, but worried for our Luigi usage that it's been a known issue for over a year and version pinning to pre-2.0 is still the suggestion edit ... now at Luigi's equivalent of this issue with edit ... seeing there's already an unmerged PR with the same set of changes I was considering, i'll try to leave a helpful comment there and then just pin sqlalchemy spotify/luigi#3267 update - there's now a work in progress change to drop 3.6 support in Luigi so the above change can be eligible for merging, which is nice to see! |
@rodscott @dolegi tagging you on this for the description above - our range of options for deploying a simple pipeline that reads data from the NAS, applies some processing steps and uploads the results to object storage via an API. If other projects are testing Argo Workflows for this then I'd be well up for trying - the envisaged issues are
|
I'm rethinking this after having seen @Kzra's recent work on https://github.com/NERC-CEH/cyto-ML (the labelling application, originally RShiny, now successfully ported to Label Studio) It uses just the image processing parts of this project (decollage plus, i hope, EXIF tagging) and wraps the rest up in shell scripts, use Given that we
Then contributing a DVC pipeline definition (in essence a YAML file that says "run these scripts sequentially, option to pass data between them, and track whether to re-run if the input hasn't changed") and an ansible playbook for setting it to run on a schedule, to that project, is probably the most useful small step onwards. Luigi has been good to explore, it was great for rapid prototyping. The object store API is a useful standalone and for container-based workflows it definitely has its place, but here it adds complexity... |
Range of options on this
[ ] luigid running on a development VM in the on-prem cloud with direct read access to NAS
[ ] chromadb also running locally on the same machine
[ ] Object store API in Posit or Datalabs (how do apps then authenticate?)
[ ] luigid in a container on e.g. kubernetes in the on-prem cloud, but with a means of mounting data from the NAS
[ ] object store API in a container too, it has a Dockerfile
[ ] Data from the NAS going unprocessed to an object store, and the pipelines reading from there, obviating the need to connect applications to local storage
[ ] tasks running in e.g. Airflow or Argo Workflows rather than within Luigi
The text was updated successfully, but these errors were encountered: