Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --parents option for COPY in Containerfiles #6008

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

Honny1
Copy link
Member

@Honny1 Honny1 commented Feb 25, 2025

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake
/kind other

What this PR does / why we need it:

This PR adds the --parents option for COPY in Containerfiles. It also includes an implementation of the --parents option for the buildah copy command.

How to verify it

Try using --parents as described in the docker documentation.

Which issue(s) this PR fixes:

Fixes: https://issues.redhat.com/browse/RUN-2193
Fixes: #5557

Special notes for your reviewer:

Does this PR introduce a user-facing change?

COPY option and buildah copy now support --parents options.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress kind/feature Categorizes issue or PR as related to a new feature. labels Feb 25, 2025
@Honny1 Honny1 marked this pull request as ready for review February 25, 2025 16:01
@rhatdan
Copy link
Member

rhatdan commented Feb 26, 2025

@nalind PTAL

Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced we need two fields in the AddAndCopyOptions, and I'm not a fan of having fields which can be nil which, when left unset, will trigger a crash.

add.go Outdated
@@ -94,6 +94,9 @@ type AddAndCopyOptions struct {
// RetryDelay is how long to wait before retrying attempts to retrieve
// remote contents.
RetryDelay time.Duration
// Parents preserve parent directories of source content
Parents bool
ParentsPatterns map[string]string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would a caller set ParentsPatterns, and what should they put in it if they do? What happens if they leave the field set to its zero value of nil? They're already passing in a list of sources, why are they required to supply more than that to get the desired effect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to explain what the content should be in the comment. If ParentsPatterns is a nil value, it will be ignored. The sources list is cleaned up with filepath.Join, which removes /./ from the path so it can't be used. Therefore, I decided to use a map to connect the source path and the path (pattern) with /./.

*/
}, "\n"),
contextDir: "copy-parents",
fsSkip: []string{"(dir):parents:mtime", "(dir):parents:(dir):y:mtime"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does docker build preserve the timestamp on y, or its permissions if they're something other than 0o755?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a rework and the current version preserves the timestamp of the copied directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Docker preserves the y timestamp.

copier/copier.go Outdated
ChmodDirs *os.FileMode // set permissions on directories. no effect on archives being extracted
ChownFiles *idtools.IDPair // set ownership of files. no effect on archives being extracted
ChmodFiles *os.FileMode // set permissions on files. no effect on archives being extracted
Parents bool // maintain the sources parent directory in the destination
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the parent directories of the items being read supposed to be emitted in the tarstream when this flag is set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a rework and the current version copies the parent directories to the tarstream.

@Honny1 Honny1 force-pushed the copy-parents branch 3 times, most recently from 8c6ca1f to 638f410 Compare February 27, 2025 19:25
@Honny1 Honny1 requested a review from nalind February 27, 2025 19:33
@Honny1 Honny1 force-pushed the copy-parents branch 2 times, most recently from a37d19e to c3349de Compare March 3, 2025 08:21
Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really needs some unit tests in the copier package to ensure that it produces exactly the right set of outputs for a given set of inputs, with no parent directories being implicit or being output multiple times.

copier/copier.go Outdated

name := filepath.Base(queue[i])
if len(req.GetOptions.ParentsPrefixToRemove) > 0 {
name = filepath.Clean(strings.TrimPrefix(item, filepath.Join(req.Directory, req.GetOptions.ParentsPrefixToRemove)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copier/copier.go Outdated
alreadyCopied[parentPath] = struct{}{}
parentName = filepath.Dir(parentName)
parentPath = filepath.Dir(parentPath)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order in which these are being emitted such that directories are being output before their parents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Yes, they are. That's probably wrong.

if iopts.parents {
options.ParentsPatterns = map[string]string{}
for _, pattern := range args {
options.ParentsPatterns[pattern] = pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be doing anything different when contextdir is set, while https://github.com/containers/buildah/pull/6008/files#diff-f2e4566c6b7e38384283187aba6d7fd91e5ba8da2ffd0f849277bb76bff27fb3R561-R562 does. If it's required, I'd expect it to be required in both places, and it's something that the integration tests should be verifying works correctly.

copier/copier.go Outdated
parentName = filepath.Dir(parentName)
parentPath = filepath.Dir(parentPath)
}
return alreadyCopied, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of implies that alreadyCopied wasn't already modified in the caller's scope, but this function was passed the map itself rather than a copy, so returning it here isn't really necessary.

copier/copier.go Outdated
@@ -1400,6 +1428,31 @@ func copierHandlerGet(bulkWriter io.Writer, req request, pm *fileutils.PatternMa
return &response{Stat: statResponse.Stat, Get: getResponse{}}, cb, nil
}

func copyParentsDirs(name string, path string, alreadyCopied map[string]struct{}, copy func(string, string, os.FileInfo) error) (map[string]struct{}, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this avoid outputting a directory that has already been output while walking the tree?

@Honny1 Honny1 force-pushed the copy-parents branch 2 times, most recently from 164e737 to d467554 Compare March 5, 2025 16:16
@Honny1
Copy link
Member Author

Honny1 commented Mar 5, 2025

@nalind I removed the use of ParentsPrefix/Patterns and only use bool flags Parents. And I added copier tests.

@Honny1 Honny1 force-pushed the copy-parents branch 6 times, most recently from e1d2e20 to da2d4c7 Compare March 6, 2025 23:02
@Honny1 Honny1 requested a review from nalind March 10, 2025 10:25
Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, except that Get() really shouldn't be outputting a given item more than once.

@Honny1 Honny1 requested a review from nalind March 14, 2025 14:23
@Honny1
Copy link
Member Author

Honny1 commented Mar 14, 2025

/packit rebuild-failed

@nalind
Copy link
Member

nalind commented Mar 17, 2025

One thing I often forget when manipulating the Name field of headers in an archive is that we also have to pay attention to the Linkname for hard link entries, and possibly also symbolic links. We should have a test case where at least two of the items being copied are hard linked together, and expect them to remain hard linked in the finished image.

When an entry with Typeflag=TypeSymlink is extracted, the value is used directly, so symbolic links in archives can point anywhere on the filesystem, regardless of whether they're relative or absolute. Dangling links are okay. We probably don't need to do anything special for them.

When an entry with Typeflag=TypeLink is extracted, a hard link is made to the path computed using the directory the archive is being extracted to and the Linkname value in the entry (this is the "oldname" parameter for os.Link()) -- the path in the Name value is only used in constructing the newname parameter for os.Link(). Hard links to contents outside of the directory where the archive is being extracted, or to items not included in the archive, are not intended to be possible.

I think that's the last part of this we need to be sure we've considered and have tests for.

@Honny1 Honny1 force-pushed the copy-parents branch 2 times, most recently from 2f69989 to 23a49a8 Compare March 18, 2025 16:14
@Honny1
Copy link
Member Author

Honny1 commented Mar 18, 2025

@nalind I added tests for hardlinks and symlinks.

Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have special logic for handling pivot points ("/./" in source paths), and for handling hard links, so both the conformance test and integration test should also include cases where they both come into play.

The tests don't automatically enforce that items that are hard linked to each other remain hard links when they're copied together, so the tests will need to check that directly by comparing the results of running something like stat -c %i against the items that are expected to be hard links. We already have tests that ensure links show up in a committed image if they're in the working container's rootfs, so if it's easier, it would be sufficient to check on that during a RUN instruction right after the COPY instruction.

It also includes an implementation of the --parents flag for the buildah copy command.

Fixes: https://issues.redhat.com/browse/RUN-2193
Fixes: containers#5557

Signed-off-by: Jan Rodák <[email protected]>
@Honny1
Copy link
Member Author

Honny1 commented Mar 18, 2025

@nalind Oh, I forgot about that. Thank you. I found the mistake. It should be fine now.

@Honny1 Honny1 requested a review from nalind March 18, 2025 21:37
@Honny1
Copy link
Member Author

Honny1 commented Mar 19, 2025

/packit rebuild-failed

@nalind
Copy link
Member

nalind commented Mar 19, 2025

LGTM

if err := copierHandlerGetOne(parentInfo, "", parentName, parent, req.GetOptions, tw, hardlinkChecker, idMappings); err != nil {
if req.GetOptions.IgnoreUnreadable && errorIsPermission(err) {
continue
} else if errors.Is(err, os.ErrNotExist) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: No need for else here.

if err != nil {
continue
return fmt.Errorf("copier: get: lstat %q: %w", parent, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is kind of a stutter. Since parent path will be mentioned twice in error message.

logrus.Warningf("copier: file disappeared while reading: %q", parent)
return nil
}
return fmt.Errorf("copier: get: %q: %w", queue[i].glob, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stutter?

@rhatdan
Copy link
Member

rhatdan commented Mar 20, 2025

Great job @Honny1. A couple of nits, and potential stutters but I will merge
/lgtm

@rhatdan
Copy link
Member

rhatdan commented Mar 20, 2025

/approve

Copy link
Contributor

openshift-ci bot commented Mar 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Honny1, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Honny1
Copy link
Member Author

Honny1 commented Mar 20, 2025

@rhatdan Thanks you. Should I fix nits?

@openshift-merge-bot openshift-merge-bot bot merged commit 3e3baee into containers:main Mar 20, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved kind/feature Categorizes issue or PR as related to a new feature. lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add --parents option for COPY in Dockerfiles
3 participants