Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

composepost: apply add-determinism pyc-zero-mtime #5019

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lukewarmtemp
Copy link
Contributor

Fixes: ostreedev/ostree#1469

Assuming that add-determinism is installed on the system, apply add-determinism to set the embedded mtime in all .pyc files to zero.

Fixes: ostreedev/ostree#1469

Assuming that add-determinism is installed on the system,
apply add-determinism to set the embedded mtime in all .pyc
files to zero.

Signed-off-by: Luke Yang <[email protected]>
Co-authored-by: Steven Presti <[email protected]>
@cgwalters
Copy link
Member

Thanks for this!

add-determinism is generally a build tool, and I think it'd be nicer to support a model where it's not shipped onto the end system.

However, right now we don't quite expose such a "multi stage" mechanism via treefile-based builds. If we did it'd be useful for a whole lot of other things too (for example, in theory one could drop dracut or even rpm out of the runtime requirements). This is what https://gitlab.com/fedora/bootc/base-images-experimental is aiming to address to a degree.

@lukewarmtemp
Copy link
Contributor Author

lukewarmtemp commented Jul 18, 2024

From what we understand, the main concern is that add-determinism would only need to used when building the base image, meaning that it would be nice if we could remove it during runtime. The hope is that whenever a "multi stage" mechanism is implemented, add-determinism would only be used during build time. However, would this also address the case when packages have been layered on top of the base image? We also need to apply add-determinism whenever ostree container commit is run since layering a new python package would still have mismatched mtimes.

As an alternative to having add-determinism on the system itself, would depending on add-determinism by importing it as a rust crate and using an API be a good option? There's discussion on this PR: keszybz/add-determinism#27 (comment) about potentially implementing an API and hashing out the details of what the caller would look like.

@cgwalters
Copy link
Member

However, would this also address the case when packages have been layered on top of the base image? We also need to apply add-determinism whenever ostree container commit is run since layering a new python package would still have mismatched mtimes.

Yes, this is a notable point.

As an alternative to having add-determinism on the system itself, would depending on add-determinism by importing it as a rust crate and using an API be a good option?

It's an option; the advantage is it keeps add-determinism as an implementation detail. It's still shipped code, but probably a good bit smaller (since we wouldn't have duplication of the common libraries).

@@ -1201,7 +1201,18 @@ fn workaround_selinux_cross_labeling_recurse(
/// This is the nearly the last code executed before we run `ostree commit`.
pub fn compose_postprocess_final(rootfs_dfd: i32, _treefile: &Treefile) -> CxxResult<()> {
let rootfs = unsafe { &crate::ffiutil::ffi_dirfd(rootfs_dfd)? };

if std::process::Command::new("add-determinism").status().expect("Failed to find add-determinism on system.").success() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be its own function like run_add_determinisim(rootfs);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is not a great way to check for the existence of a binary. While the Rust stdlib doesn't have a single function for this I think it's trivial to do by combining

let r = std::process::Command::new("add-determinism")
.arg("--handler")
.arg("pyc-zero-mtime")
.arg("/usr")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't do what you expect or want. In an rpm-ostree build, the /usr is whatever happens to be the build environment, not the target.

In rpm-ostree we try hard to use fd-relative accesses generally via cap-std.

To do so with an external process, a typical pattern is to use e.g. cwd_dir and then pass . to the child process as the target.

@keszybz
Copy link
Contributor

keszybz commented Jul 21, 2024

However, would this also address the case when packages have been layered on top of the base image? We also need to apply add-determinism whenever ostree container commit is run since layering a new python package would still have mismatched mtimes.

Yes, this is a notable point.

Yeah, so it really sounds like it needs to be on the image.

As an alternative to having add-determinism on the system itself, would depending on add-determinism by importing it as a rust crate and using an API be a good option?

It's an option; the advantage is it keeps add-determinism as an implementation detail. It's still shipped code, but probably a good bit smaller (since we wouldn't have duplication of the common libraries).

Right. Currently, the binary is 2.3MB. I think this is totally fine for build environments, but it's a bit heavy-weight for being installed on final images. Using it as a crate is one option. Another option would be to provide a stripped-down separate binary build…

Mobbing WIP. Contains pseudo code.
@cgwalters
Copy link
Member

Currently, the binary is 2.3MB. I think this is totally fine for build environments, but it's a bit heavy-weight for being installed on final images.

rpm-ostree already drags in a lot of stuff transitively, most notably the libdnf stack, etc. The use cases we have today tend to have at least 400+MB images, commonly GB or more when you add desktops in the mix, libvirt, etc.

Was thinking about this more and I have no problem just making rpm-ostree soft depend on a separate add-determinism binary. Maybe even hard depend by default with a build-time option to disable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ostree archive's mtime modification results in slower python execution
3 participants