-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: do not include uncommitted changes in the sdist #587
Conversation
f453adb
to
1dd2a7b
Compare
Thanks. Quick comment: could you update the sdist docs in |
|
You're right. My bad, that's what I get for trying to squeeze in a quick comment before signing off. I saw the removal of the |
Should we merge this and leave the documentation updates to a separate PR? |
I'm okay with doing the docs in a follow-up. This is ready to go, right? If so, I need to do some testing with numpy, scipy, & co. |
Yes, it is ready to go. The documentation updates are coming soon. |
Some typos: |
@dnicolodi I'd like to merge this for the next release after doing some final testing (likely tomorrow). Would you mind resolving the conflict and fixing the commit message typos pointed out by @QuLogic? |
@rgommers Sure, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping the "uncommitted changes" feature looks fine to me. It is indeed no longer needed in numpy/scipy, and the planned pip
change to install via sdist (which was a worry 3 years ago) is permanently shelved/rejected now AFAIK.
The change to the test looks good. There are a few other changes in behavior (file permissions and ownership) in the current version of this PR that you didn't touch on in the PR description and may be unintentional.
file_stat = os.stat(path) | ||
info.mtime = member.mtime | ||
info.size = file_stat.st_size | ||
info.mode = int(oct(file_stat.st_mode)[-3:], 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are meaningful and not strictly related to uncommitted changes to files.
Example with SciPy:
$ ls -l tools/lint.* # local git repo
-rwxr-xr-x 1 rgommers rgommers 4318 19 apr 12:16 ../tools/lint.py
-rw-r--r-- 1 rgommers rgommers 1180 19 apr 12:16 ../tools/lint.toml
Compared to what I see in the sdist (screenshots because extracting files changes ownership):
Note that the file permissions changed compared to what's in the git repo, and ownership now includes my own username. The former I'm not sure about either way, that is behavior inherited from meson dist
it looks like. The latter seems undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is unintended, in the sense that I trustedmeson dist
(and thus in turn git archive
) to be doing the right thing. I'm also not sure whether the previous behavior was deliberate: it seems that files generated via dist scripts do not go through the same metadata mangling, and the metadata mangling seems more an accidental omission than a normalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For file ownership, I checked git archive --format=tar.gz HEAD > tmp.tar.gz
and it sets the ownership to root
. So I guess the repacking changes it to local-username. Either way, the current behavior seems preferred, since it gives the same result when run on multiple machines.
Re file permissions: there are config settings for it, e.g. tar.umask
in https://git-scm.com/docs/git-archive#_configuration. What git-archive
does by default is probably good for reproducibility, and seems to be by design.
Having a peek in meson/mdist.py
, it doesn't pass any options like --owner=0
/--group=0
, --no-same-owner
, --no-same-permissions
to tar
. Not sure if that just never came up, or was rejected before. I can't find anything related in the Meson issue tracker so quickly.
I'm also not sure whether the previous behavior was deliberate: it seems that files generated via dist scripts do not go through the same metadata mangling
I wouldn't read too much into that, since this code was written pretty early in the project's history, and I don't think I knew about add_dist_script
at the time; use of add_dist_script
probably hadn't come up at all yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only spent a few minutes investigating the thing, and it seems that the archive metadata changes happen in Meson. I tend to think that it is a Meson bug. Unless there are subtleties I am missing, meson-python just copies over the archive members as is. The Meson behavior needs to be fixed, but in the meantime we can fix the metadata in meson-python.
What
git-archive
does by default is probably good for reproducibility, and seems to be by design.
What does git archive
do, other than setting the owner to root:root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One funny thing, is that git archive
, Meson, and meson-python seem to use three different tar formats.
It would be nice if there were a way to avoid packing the files up three times, but I'm not sure there is one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if there were a way to avoid packing the files up three times, but I'm not sure there is one.
git archive
and meson dist
are trying to do fundamentally different things as a result of dist scripts existing. It will never be possible to reuse git's output directly (at least, impossible except as a micro-optimization for a subset of cases that always excludes the creation of python dist tarballs with sdist metadata.
It would be possible to uplift the meson-python specific changes into meson dist
, of course. At the most basic level, it's "just" a kind of domain specific dist script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that reproducibility is not necessarily a requirement of
meson dist
It would be nice for meson dist
to be reproducible if the dist scripts are reproducible, following a bit more closely what git archive
does. Using the correct file metadata in the archive does not seem difficult to achieve, and it is at least a step in the right direction.
Meson uses shutil.make_archive, which isn't well known for it's high degree of configurability
shutil.make_archive()
has owner
and group
argument, that seem to fit at least part of the requirement.
For avoiding packing the files three times, I was more thinking about not having the lower layer pack them at all and have the upper layer do the packing in the way it likes, more than reusing the tarball generated by the lower layer. meson-python just needs to add a file to the archive, and this is possible without unpacking and repacking. The reason why meson-python nedds to unpack and repack the tarball is to fix the rood directory in the arvhive: the project name and version seen by Meson and meson-python are not necessarily the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seem to fit at least part of the requirement.
"Part", eh. :D
Granted this is something that could be properly changed by giving up shutil.make_archive as a bad idea and rolling tarfile/zipfile manually.
The reason why meson-python nedds to unpack and repack the tarball is to fix the rood directory in the arvhive: the project name and version seen by Meson and meson-python are not necessarily the same.
... although they probably should be, and for dynamic version it will be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although they probably should be, and for dynamic version it will be required.
I'm not sure they should. For project that only want to produce a Python packages, specifying the version only in pyproject.toml
seems a perfectly valid thing to do, and for project that are more than a Python package, it seems reasonable to have the Python part have a different name than the whole. Introducing a restriction for them to be the same does not seem worth just to make it a bit easier to generate the sdist and it would be easier only if we fix meson dist
to emit the right archive member metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying it should be restricted, just that in a very large number of common cases they will be the same and won't need tweaking.
It is, at least, a useful micro optimization to check for (assuming no other issues).
As for versions, you can provide it in either or. But if you provide it in meson.build you can handle git versioning as well as using the version as a replacement string to insert into e.g. _version.py, so it is my (entirely personal) opinion that people should want to declare it as dynamic, because it is simply superior.
6f48e18
to
638caa3
Compare
@rgommers I fixed the ownership metadata in the sdist archive to be root:root (not only to use the 0 uid and gid as it was implicitly done before). This follows what |
f0cdb9b
to
4503aec
Compare
That sounds great! Thanks for improving the docs too. I'll try and revisit this in the morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a new issue with this latest change. Trying this on NumPy, I get a split tree due to different package names and versions:
$ tree -L 2 .
.
├── numpy-2.1.0.dev0
│ ├── azure-pipelines.yml
...
│ ├── tools
│ └── vendored-meson
└── NumPy-2.1.0.dev0+git20240424.f8392ce
├── doc
├── numpy
└── vendored-meson
The project name is a difference in capitalization (easy to change, but still this shouldn't happen). The git version being appended in the project()
call in meson.build
is something I'm not a big fan of, but it's what people do. It should not affect the sdist contents, only the version in pyproject.toml
should be used.
I'm very surprised that this happens, especially by the path that two different paths are used. I don't see how that can happen. Let me see if I can reproduce it. |
Actually, we have tests where the project name passed to Meson's |
I can reproduce it building an sdist for numpy. But I really cannot get how it can happen. |
All the paths that are not correctly rewritten are longer than 100 characters. It looks like a Python bug to me. Investigating further. |
Yup. I'm pretty sure it is a Python |
Issue fixed and test added. This was a funny one to track down. |
The codecov CI job is stuck and I didn't find a way to kill it. |
@dnicolodi it looks like you didn't actually push your fix? |
Indeed. Pushed now. I think I was waiting to file a bug for CPython to be able to reference it in the code comment and commit message, but I haven't submitted it yet. I don't think it is important to reference it here, and I can add a reference later. Pushed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retested, everything looks good to go now, modulo a few typos. Thanks @dnicolodi!
Including uncommitted changes in the sdist was implemented in mesonbuild#58 after the discussion in mesonbuild#53. However, the current behavior for which uncommitted changes to files under version control are included but not other files, is a hard to justify surprising half measure. After careful analysis, it has been determined that none of the use cases for this feature is still valid. Removing it makes the implementation easier, and the behavior less surprising and easier to document and explain. While at it, modernize the associated test.
It is not relevant for the users and takes precious space in the documentation index, which is getting quite long.
Including uncommitted changes in the sdist was implemented in #58 after the discussion in #53. However, the current behavior for which uncommitted changes to files under version control are included but not other files, is an hard to justify surprising half measure.
After careful analysis, it has been determined that none of the use cases for this feature is still valid. Removing it makes the implementation easier, and the behavior less surprising and easier to document an explain.