-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versioning dependancies for users loading as python package #2497
Versioning dependancies for users loading as python package #2497
Conversation
… silicon Signed-off-by: Nelson Auner <[email protected]>
d340165
to
5bc7add
Compare
For more information, see https://pre-commit.ci
I've been using $ mamba env create --name pudl-dev --file devtools/environment.yml How does your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy bujeezus, 13,345 lines. That is a lot of dependencies. I guess it's for 4 separate platforms but wow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but think of how epic it will make everyone's contributions statistics look! 😕
- python=3.10 | ||
- pip | ||
- pip: | ||
- h3==3.7.6 # h3-py 3.7.4 on conda-forge fails, see https://github.com/scikit-build/scikit-build/pull/901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this version pin here won't help folks who are just doing
mamba install catalystcoop.pudl
on Apple silicon though, which I think is the urgent near-term problem. Could we branch off of the commit that we used for that old release, add this version pin to our dependencies in setup.py
and do a new release to PyPI & conda-forge
with just that small change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd like to pin all of our dependencies and rely on conda-lock
to do that work for us, how do we need to change the way we think about creating our software environment?
The conda-lock.yml
file gets checked into source control and is supposed to represent a set of platform-specific packages which exact versions that, if installed, is known to work (since we'll have run all our tests within that environment and gotten the ✅ right?
How would this affect our development environment? Right now we typically do development within the environment defined by devtools/environment.yml
. Would we switch to doing development in the environment defined by conda-lock.yml
plus some additional packages? In development we can only install catalystcoop.pudl
using pip -e
right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we:
- avoid the need to specify our dependencies in more than one place.
- ensure that we have a fully specified environment that doesn't drift over time.
- periodically update all our dependencies intentionally and check that everything is still working (and if is, save the new set of dependencies as the default going forward).
- use the same environment for published packages, deployment (e.g. in CI, local tests with Tox, and our nightly builds) and also in the environment that we're developing in day to day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoughtful review!
- Short-term: Yes, I'll take a stab at branching off of current commit to solve apple silicon problem while we discuss this. (I totally missed
devtools/environment.yml
folder, sorry!) - Yes, you are totally correct that PUDL would move to using
conda-lock
as its source of truth for dependancies. So the test runners should boot up their environment using conda-lock, the prod system should run using conda-lock, etc. - Development environment: I'm less puritanical here and I think the process would be installing the environment with conda-lock and then the package with
pip -e
. The important part is that after you're done developing, you know that any tests and production will be done on theconda-lock.yml
deps. So if your development work requires updating a package, or adding a new one, that has to be frozen into the lockfile. This will be enforced automatically when the test suite installs from the lockfile and then tries to run your code.
I think one question you alluded to is "Do we need both an environmental.yml
and a conda-lock.yml
?!
I come from Pipfile
world where every project has a Pipfile
and a Pipfile.lock
so I don't have a strong reaction to two files - one that's human readable, and another (lock file) that's really pinning down specific version dependancies. People get used to it pretty quickly.
Now, using your bullet points as a ruberic:
Goal | Does conda-lock help? |
---|---|
Dependancies in only one place | Kind of. For python packages conda-lock is superior since it is pinning version numbers |
Environment doesn't drift | Definitely |
Periodically update dependancies and test that it works | Yes, definitely. This can be done by volunteers or an automated job runner that updates dependancies, sees if tests works, and then opens a PR. Updating deps is a code change and should deserve the same rigor of testing |
Use same env for published packages, deployment, and dev environment | I think so. Everyone should use the environment created by conda-lock and then pip -e for local development of the pudl code. |
I think we've mostly avoided directing to install the software directly because it is so tightly linked to the data. If you have just the software it's not that useful. You also need to have the appropriate data. With the software you can generate the data, but that's more involved than we expect anyone other than us or open source contributors to get. So we've tried to tell people how to access the data directly, or the data+software together. |
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## dev #2497 +/- ##
=======================================
- Coverage 87.2% 87.2% -0.1%
=======================================
Files 81 81
Lines 9511 9511
=======================================
- Hits 8302 8299 -3
- Misses 1209 1212 +3 see 2 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
PR Overview
This small PR attempts to add in version control to dependancies for users installing
pudl
viaconda
ormamba
.As a side-benefit, it also solves immediate issue #2426 for Apple Silicon computers by installing
h3
from pip, instead of from conda-forge. Conda-forge has an earlier version ofh3
that does not support apple silicon.I would love feedback on these aspects:
environment.yml
? Or justconda-lock
? or both?README.md
be changed? There wasn't a whole lot there on pure python installs, anyways. In what situations are non-developers installingpudl
locally via conda/pip?Workflow:
Using pure mamba and environment.yml
Using conda-lock to create lockfiles
Using conda-lock.yml to install from scratch 😄
PR Checklist
dev
).