Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: do not include uncommitted changes in the sdist #587

Merged
merged 4 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 8 additions & 18 deletions docs/contributing/release-process.rst → RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,16 @@
..
.. SPDX-License-Identifier: MIT


.. _contributing-release-process:

***************
Release Process
***************
===============

All releases are PGP signed with one of the keys listed in the
`installation page`_. Before releasing please make sure your PGP key is listed
there, and preferably signed by one of the other key holders.
All releases are PGP signed with one of the keys listed in ``docs/about.rst``.
Before releasing please make sure your PGP key is listed there, and preferably
signed by one of the other key holders.

If your key is not signed by one of the other key holders, please make sure the
PR that added your key to the :doc:`../about` page was approved by at least one
other maintainer.
If your key is not signed by one of the other key holders, please make sure
that the PR that added your key to ``docs/about.rst`` was approved by at least
one other maintainer.

After that is done, you may release the project by following these steps:

Expand Down Expand Up @@ -44,7 +40,7 @@ After that is done, you may release the project by following these steps:
$ git push
$ git push --tags

#. Release to `PyPI <https://pypi.org/project/meson-python/>`_
#. Release to PyPI:

#. Build the Python artifacts:

Expand All @@ -60,9 +56,3 @@ After that is done, you may release the project by following these steps:

There is no need to GPG-sign the artifacts: PyPI no longer
supports uploading GPG signatures.

If you have any questions, please look at previous releases and/or ping the
other maintainers.


.. _installation page: installation
11 changes: 0 additions & 11 deletions docs/contributing/index.rst

This file was deleted.

15 changes: 0 additions & 15 deletions docs/explanations/design-old.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,21 +31,6 @@ Python tool (pip_, `pypa/build`_, etc.) to build and install the project.
``meson-python`` will build a Python sdist (source distribution) or
wheel (binary distribution) from Meson_ project.

Source distribution (sdist)
---------------------------

The source distribution is based on ``meson dist``, so make sure all your files
are included there. In git projects, Meson_ will not include files that are not
checked into git, keep that in mind when developing. By default, all files
under version control will be included in the sdist. In order to exclude files,
use ``export-ignore`` or ``export-subst`` attributes in ``.gitattributes`` (see
the ``git-archive`` documentation for details; ``meson dist`` uses
``git-archive`` under the hood).

Local (uncommitted) changes to files that are under version control will be
included. This is often needed when applying patches, e.g. for build issues
found during packaging, to work around test failures, to amend the license for
vendored components in wheels, etc.

Binary distribution (wheels)
----------------------------
Expand Down
54 changes: 54 additions & 0 deletions docs/how-to-guides/sdist.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
.. SPDX-FileCopyrightText: 2024 The meson-python developers
..
.. SPDX-License-Identifier: MIT

.. _sdist:

******************************
Creating a source distribution
******************************

A source distribution for the project can be created executing

.. code-block:: console

$ python -m build --sdist .

in the project root folder. This will create a ``.tar.gz`` archive in the
``dist`` folder in the project root folder. This archive contains the full
contents of the latest commit in revision control with all revision control
metadata removed. Uncommitted modifications and files unknown to the revision
control system are not included.

The source distribution archive is created by adding the required metadata
files to the archive obtained by executing the ``meson dist --no-tests
--allow-dirty`` command. To generate a source distribution, ``meson-python``
must successfully configure the Meson project by running the ``meson setup``
command. Additional arguments can be passed to ``meson dist`` to alter its
behavior. Refer to the relevant `Meson documentation`__ and to the
:ref:`how-to-guides-meson-args` guide for details.

The ``meson dist`` command uses the archival tool of the underlying revision
control system for creating the archive. This implies that a source
distribution can only be created for a project versioned in a revision control
system. Meson supports the Git and Mercurial revision control systems.

Files can be excluded from the source distribution via the relevant mechanism
provided by the revision control system. When using Git as a revision control
system, it is possible to exclude files from the source distribution setting
the ``export-ignore`` attribute. For example, adding a ``.gitattributes``
files containing

.. code-block:: none

dev/** export-ignore

would result in the ``dev`` folder to be excluded from the source
distribution. Refer to the ``git archive`` documentation__ for
details. Another mechanism to alter the content of the source distribution is
offered by dist scripts. Refer to the relevant `Meson documentation`__ for
details.

__ https://mesonbuild.com/Creating-releases.html
__ https://git-scm.com/docs/git-archive#ATTRIBUTES
__ https://mesonbuild.com/Reference-manual_builtin_meson.html#mesonadd_dist_script
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ the use of ``meson-python`` and Meson for Python packaging.
:hidden:

tutorials/introduction
how-to-guides/sdist
how-to-guides/editable-installs
how-to-guides/config-settings
how-to-guides/meson-args
Expand All @@ -100,7 +101,6 @@ the use of ``meson-python`` and Meson for Python packaging.

changelog
about
contributing/index
Discussions <https://github.com/mesonbuild/meson-python/discussions>
Source Code <https://github.com/mesonbuild/meson-python>
Issue Tracker <https://github.com/mesonbuild/meson-python/issues>
106 changes: 59 additions & 47 deletions mesonpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
import tarfile
import tempfile
import textwrap
import time
import typing
import warnings

Expand Down Expand Up @@ -865,64 +864,77 @@ def _meson_version(self) -> str:

def sdist(self, directory: Path) -> pathlib.Path:
"""Generates a sdist (source distribution) in the specified directory."""
# generate meson dist file
# Generate meson dist file.
self._run(self._meson + ['dist', '--allow-dirty', '--no-tests', '--formats', 'gztar', *self._meson_args['dist']])

# move meson dist file to output path
dist_name = f'{self._metadata.distribution_name}-{self._metadata.version}'
meson_dist_name = f'{self._meson_name}-{self._meson_version}'
meson_dist_path = pathlib.Path(self._build_dir, 'meson-dist', f'{meson_dist_name}.tar.gz')
sdist = pathlib.Path(directory, f'{dist_name}.tar.gz')
sdist_path = pathlib.Path(directory, f'{dist_name}.tar.gz')
pyproject_toml_mtime = 0

with tarfile.open(meson_dist_path, 'r:gz') as meson_dist, mesonpy._util.create_targz(sdist) as tar:
with tarfile.open(meson_dist_path, 'r:gz') as meson_dist, mesonpy._util.create_targz(sdist_path) as sdist:
for member in meson_dist.getmembers():
# calculate the file path in the source directory
assert member.name, member.name
member_parts = member.name.split('/')
if len(member_parts) <= 1:
continue
path = self._source_dir.joinpath(*member_parts[1:])

if not path.exists() and member.isfile():
# File doesn't exists on the source directory but exists on
# the Meson dist, so it is generated file, which we need to
# include.
# See https://mesonbuild.com/Reference-manual_builtin_meson.html#mesonadd_dist_script

# MESON_DIST_ROOT could have a different base name
# than the actual sdist basename, so we need to rename here
if member.isfile():
file = meson_dist.extractfile(member.name)
member.name = str(pathlib.Path(dist_name, *member_parts[1:]).as_posix())
tar.addfile(member, file)
continue

if not path.is_file():
continue
# Reset pax extended header. The tar archive member may be
# using pax headers to store some file metadata. The pax
# headers are not reset when the metadata is modified and
# they take precedence when the member is deserialized.
# This is relevant because when rewriting the member name,
# the length of the path may shrink from being more than
# 100 characters (requiring the path to be stored in the
# pax headers) to being less than 100 characters. When this
# happens, the tar archive member is serialized with the
# shorter name in the regular header and the longer one in
# the extended pax header. The archives handled here are
# not expected to use extended pax headers other than for
# the ones required to encode file metadata. The easiest
# solution is to reset the pax extended headers.
member.pax_headers = {}

# Rewrite the path to match the sdist distribution name.
stem = member.name.split('/', 1)[1]
member.name = '/'.join((dist_name, stem))

if stem == 'pyproject.toml':
pyproject_toml_mtime = member.mtime

# Reset owner and group to root:root. This mimics what
# 'git archive' does and makes the sdist reproducible upon
# being built by different users.
member.uname = member.gname = 'root'
member.uid = member.gid = 0

sdist.addfile(member, file)

# Add 'PKG-INFO'.
member = tarfile.TarInfo(f'{dist_name}/PKG-INFO')
member.uid = member.gid = 0
member.uname = member.gname = 'root'

# Set the 'PKG-INFO' modification time to the modification time of
# 'pyproject.toml' in the archive generated by 'meson dist'. In
# turn this is the last commit time, unless touched by a dist
# script. This makes the sdist reproducible upon being built at
# different times, when dist scripts are not used, which should be
# the majority of cases.
#
# Note that support for dynamic version in project metadata allows
# the version to depend on the build time. Therefore, setting the
# 'PKG-INFO' modification time to the 'pyproject.toml'
# modification time can be seen as not strictly correct. However,
# the sdist standard does not dictate which modification time to
# use for 'PKG-INFO'. This choice allows to make the sdist
# byte-for-byte reproducible in the most common case.
member.mtime = pyproject_toml_mtime

info = tarfile.TarInfo(member.name)
file_stat = os.stat(path)
info.mtime = member.mtime
info.size = file_stat.st_size
info.mode = int(oct(file_stat.st_mode)[-3:], 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are meaningful and not strictly related to uncommitted changes to files.

Example with SciPy:

$ ls -l tools/lint.*  # local git repo
-rwxr-xr-x 1 rgommers rgommers 4318 19 apr 12:16 ../tools/lint.py
-rw-r--r-- 1 rgommers rgommers 1180 19 apr 12:16 ../tools/lint.toml

Compared to what I see in the sdist (screenshots because extracting files changes ownership):

with 0.16.0:
image

with this PR:
image

Note that the file permissions changed compared to what's in the git repo, and ownership now includes my own username. The former I'm not sure about either way, that is behavior inherited from meson dist it looks like. The latter seems undesirable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is unintended, in the sense that I trustedmeson dist (and thus in turn git archive) to be doing the right thing. I'm also not sure whether the previous behavior was deliberate: it seems that files generated via dist scripts do not go through the same metadata mangling, and the metadata mangling seems more an accidental omission than a normalization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For file ownership, I checked git archive --format=tar.gz HEAD > tmp.tar.gz and it sets the ownership to root. So I guess the repacking changes it to local-username. Either way, the current behavior seems preferred, since it gives the same result when run on multiple machines.

Re file permissions: there are config settings for it, e.g. tar.umask in https://git-scm.com/docs/git-archive#_configuration. What git-archive does by default is probably good for reproducibility, and seems to be by design.

Having a peek in meson/mdist.py, it doesn't pass any options like --owner=0/--group=0, --no-same-owner, --no-same-permissions to tar. Not sure if that just never came up, or was rejected before. I can't find anything related in the Meson issue tracker so quickly.

I'm also not sure whether the previous behavior was deliberate: it seems that files generated via dist scripts do not go through the same metadata mangling

I wouldn't read too much into that, since this code was written pretty early in the project's history, and I don't think I knew about add_dist_script at the time; use of add_dist_script probably hadn't come up at all yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only spent a few minutes investigating the thing, and it seems that the archive metadata changes happen in Meson. I tend to think that it is a Meson bug. Unless there are subtleties I am missing, meson-python just copies over the archive members as is. The Meson behavior needs to be fixed, but in the meantime we can fix the metadata in meson-python.

What git-archive does by default is probably good for reproducibility, and seems to be by design.

What does git archive do, other than setting the owner to root:root?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One funny thing, is that git archive, Meson, and meson-python seem to use three different tar formats.

It would be nice if there were a way to avoid packing the files up three times, but I'm not sure there is one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if there were a way to avoid packing the files up three times, but I'm not sure there is one.

git archive and meson dist are trying to do fundamentally different things as a result of dist scripts existing. It will never be possible to reuse git's output directly (at least, impossible except as a micro-optimization for a subset of cases that always excludes the creation of python dist tarballs with sdist metadata.

It would be possible to uplift the meson-python specific changes into meson dist, of course. At the most basic level, it's "just" a kind of domain specific dist script.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that reproducibility is not necessarily a requirement of meson dist

It would be nice for meson dist to be reproducible if the dist scripts are reproducible, following a bit more closely what git archive does. Using the correct file metadata in the archive does not seem difficult to achieve, and it is at least a step in the right direction.

Meson uses shutil.make_archive, which isn't well known for it's high degree of configurability

shutil.make_archive() has owner and group argument, that seem to fit at least part of the requirement.

For avoiding packing the files three times, I was more thinking about not having the lower layer pack them at all and have the upper layer do the packing in the way it likes, more than reusing the tarball generated by the lower layer. meson-python just needs to add a file to the archive, and this is possible without unpacking and repacking. The reason why meson-python nedds to unpack and repack the tarball is to fix the rood directory in the arvhive: the project name and version seen by Meson and meson-python are not necessarily the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seem to fit at least part of the requirement.

"Part", eh. :D

Granted this is something that could be properly changed by giving up shutil.make_archive as a bad idea and rolling tarfile/zipfile manually.

The reason why meson-python nedds to unpack and repack the tarball is to fix the rood directory in the arvhive: the project name and version seen by Meson and meson-python are not necessarily the same.

... although they probably should be, and for dynamic version it will be required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although they probably should be, and for dynamic version it will be required.

I'm not sure they should. For project that only want to produce a Python packages, specifying the version only in pyproject.toml seems a perfectly valid thing to do, and for project that are more than a Python package, it seems reasonable to have the Python part have a different name than the whole. Introducing a restriction for them to be the same does not seem worth just to make it a bit easier to generate the sdist and it would be easier only if we fix meson dist to emit the right archive member metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying it should be restricted, just that in a very large number of common cases they will be the same and won't need tweaking.

It is, at least, a useful micro optimization to check for (assuming no other issues).

As for versions, you can provide it in either or. But if you provide it in meson.build you can handle git versioning as well as using the version as a replacement string to insert into e.g. _version.py, so it is my (entirely personal) opinion that people should want to declare it as dynamic, because it is simply superior.


# rewrite the path if necessary, to match the sdist distribution name
if dist_name != meson_dist_name:
info.name = pathlib.Path(
dist_name,
path.relative_to(self._source_dir)
).as_posix()

with path.open('rb') as f:
tar.addfile(info, fileobj=f)

# add PKG-INFO to dist file to make it a sdist
pkginfo_info = tarfile.TarInfo(f'{dist_name}/PKG-INFO')
pkginfo_info.mtime = time.time() # type: ignore[assignment]
metadata = bytes(self._metadata.as_rfc822())
pkginfo_info.size = len(metadata)
tar.addfile(pkginfo_info, fileobj=io.BytesIO(metadata))
member.size = len(metadata)
sdist.addfile(member, io.BytesIO(metadata))

return sdist
return sdist_path

def wheel(self, directory: Path) -> pathlib.Path:
"""Generates a wheel in the specified directory."""
Expand Down
6 changes: 5 additions & 1 deletion mesonpy/_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,11 @@ def create_targz(path: Path) -> Iterator[tarfile.TarFile]:
os.makedirs(os.path.dirname(path), exist_ok=True)
file = typing.cast(IO[bytes], gzip.GzipFile(
path,
mode='wb',
mode='w',
# Set the stream last modification time to 0. This mimics
# what 'git archive' does and makes the archives byte-for-byte
# reproducible.
mtime=0,
))
tar = tarfile.TarFile(
mode='w',
Expand Down
5 changes: 5 additions & 0 deletions tests/packages/long-path/meson.build
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# SPDX-FileCopyrightText: 2024 The meson-python developers
#
# SPDX-License-Identifier: MIT

project('very-long-project-name-that-makes-the-paths-within-the-sdist-exceed-100-characters-xxxxxxxxxxxxxxxxx', version: '1.0.0')
11 changes: 11 additions & 0 deletions tests/packages/long-path/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# SPDX-FileCopyrightText: 2021 The meson-python developers
#
# SPDX-License-Identifier: MIT

[build-system]
build-backend = 'mesonpy'
requires = ['meson-python']

[project]
name = 'long-path'
dynamic = ['version']
Loading
Loading