Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: utils: copyfile: reset permissions on macos reflinks #245

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Dec 6, 2023

Related #238

@efiop efiop merged commit d7236e2 into main Dec 6, 2023
@efiop efiop added the bug Something isn't working label Dec 6, 2023
@@ -189,6 +191,9 @@ def copyfile(

try:
system.reflink(src, dest)
if sys.platform == "darwin":
# NOTE: reflink on macos also clones src permissions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to be clear, what pyreflink is doing is correct, then, right? It is always preserving permissions. It is not doing any chmod operations on macOS because clonefile does that already, but pyreflink is copying permissions on Linux as ficlone does not do that.

Copy link
Member

@skshetry skshetry Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to be a 1:1 replacement with pyreflink, it has to be:

reflink(src, dst)
if sys.platform == "linux":
    shutil.copymode(src, dst)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would say that cloning permissions is correct, as it seems inconsistent, a bit wasteful and also kinda unfinished, since it only copies mode but not other stuff (like shutil.copystats). E.g. for us the fact that pyreflink does that on linux is a complete waste, as we need to reset permissions ourselves anyway.

So yeah, we are chmodding different perms and for different reasons. And pyreflink is just trying to be consistent between linux and macos (thouhg I'm not sure if macos clone file clones stats in addition to just mode).

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

Did you test on Linux with CoW filesystems btw?

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

@skshetry Nope, just mac for now.

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

@efiop, looks like we need to chmod for linux too. About 10 tests are failing for me when I run with:

TMPDIR=$(realpath ../tmp); mkdir $TMPDIR
pytest -nlogical

Most of those are import related and have isexec: True set. Setting chmod in both copyfile and link fix the issue for me. The files also have executable bit set.

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

For me,

reflink("pyproject.toml", "pyproject.toml2")

creates a file with executable bit set. Looks like this is a problem that I introduced with that optimization.

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

@skshetry Thanks for testing it! I'll send a fix ASAP.

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

@skshetry what filesystem do you use? I have a suspicion that it depends on the filesystem really (we tend to mixup platform and filesystem a lot when we discuss this, but it is really up to the fs on how it wants to handle it).

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

I am not sure what the fix should be. The os.open has a default mode of 077. umask is usually 022. So the file gets created with 755 perms.

One way might be to set mode=0o666 in os.open, the other would be to copymode. But I am not sure where our fix should be here.

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

@skshetry what filesystem do you use?

I use btrfs.

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

I see builtin.open() sets 0o666 by default. I think I should do the same.

https://github.com/python/cpython/blob/cc7e45cc572dd818412a649970fdee579417701f/Lib/_pyio.py#L1550

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

@skshetry We just need to normalize the perms, that's what we've done before.

Could you try with dvc-objects 1.4.8, please? I don't have linux COW fs handy right now 🙁

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

Looks like this is a problem that I introduced with that optimization.

I'm pretty sure it was me, I removed chmod that we used to have completely and only kept it for macos after that.

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

hmm, I'm a bit worried about world-readable file that we are creating with this. It could have some security implications. I see that pyreflink uses S_IRUSR | S_IWUSR before it copies the stats.

Maybe we should do the same here?

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

@skshetry We just use the umask, so the file we create is the same as what the user would create from scratch. Same as shutil.copyfile too. The normal way to go about it is to use umask anyway, so I don't think we need to do anything special here and pyreflink's approach actually looks wrong to me.

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

yeah. I'd still sleep better today if we merge this #248.

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

@skshetry So you confirm everything works fine now for you with reflink on btrfs, right?

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

yes, everything works, all the tests pass when I change tmpdir to btrfs.

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

Thank you, @skshetry ! 🙏

@skshetry
Copy link
Member

skshetry commented Dec 6, 2023

pyreflink's approach actually looks wrong to me.

This is likely because, when you are copying file with a restricted permission, you'll end up creating a world-readable file, exposing contents to others.

So the approach feels correct to me.

Regarding dvc, with dvc's cache, I am not sure it is relevant. At least it's hard to say it is.

@efiop
Copy link
Contributor Author

efiop commented Dec 6, 2023

@skshetry Didn't realise it was setting that mode before cloning. Indeed, you are correct about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants