Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockdev: use 'blkid' for reading device's UUID #917

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nikita-dubrovskii
Copy link
Contributor

firstboot of RHCOS on IBM zKVM from time to time fails during "File System Check".
This happens, because systemd unit has an old filesystem's UUID from pristine qcow2 image,
not the regenerated one:

coreos-boot-edit: + lsblk -o NAME,LABEL,UUID --paths --pairs /dev/disk/by-label/boot
coreos-boot-edit: NAME="/dev/mapper/crypt_bootfs" LABEL="boot" UUID="96d15588-3596-4b3c-adca-a2ff7279ea63"
coreos-boot-edit: + blkid /dev/disk/by-label/boot
coreos-boot-edit: /dev/disk/by-label/boot: LABEL="boot" UUID="eee55c4f-c2df-47e9-a284-992e9e122a97" BLOCK_SIZE="1024" TYPE="ext4"
coreos-boot-edit: + rdcore bind-boot /sysroot /mnt/boot_partition
.....
coreos-boot-mount-generator: ++ cat /run/coreos/bootfs_uuid
coreos-boot-mount-generator: + bootdev=/dev/disk/by-uuid/96d15588-3596-4b3c-adca-a2ff7279ea63

firstboot of RHCOS on IBM zKVM from time to time fails during "File System Check".
This happens, because systemd unit has an old filesystem's UUID from pristine qcow2 image,
not the regenerated one:

```
coreos-boot-edit: + lsblk -o NAME,LABEL,UUID --paths --pairs /dev/disk/by-label/boot
coreos-boot-edit: NAME="/dev/mapper/crypt_bootfs" LABEL="boot" UUID="96d15588-3596-4b3c-adca-a2ff7279ea63"
coreos-boot-edit: + blkid /dev/disk/by-label/boot
coreos-boot-edit: /dev/disk/by-label/boot: LABEL="boot" UUID="eee55c4f-c2df-47e9-a284-992e9e122a97" BLOCK_SIZE="1024" TYPE="ext4"
coreos-boot-edit: + rdcore bind-boot /sysroot /mnt/boot_partition
.....
coreos-boot-mount-generator: ++ cat /run/coreos/bootfs_uuid
coreos-boot-mount-generator: + bootdev=/dev/disk/by-uuid/96d15588-3596-4b3c-adca-a2ff7279ea63
```

Signed-off-by: Nikita Dubrovskii <[email protected]>
@jlebon
Copy link
Member

jlebon commented Jul 12, 2022

Hmm, so the bootfs UUID reported by lsblk is stale? Do we know why?

@nikita-dubrovskii
Copy link
Contributor Author

Hmm, so the bootfs UUID reported by lsblk is stale? Do we know why?

i guess that issue is somewhere between old kernel of RHEL and udev on zKVM. FCOS works just fine. Maybe i'm wrong.

@cgwalters
Copy link
Member

Hmm. I think this may be that lsblk uses the kernel's cached view of things by reading from /sys, but blkid opens the block device directly.

(Comparing e.g. strace -f lsblk /dev/vda vs strace -f blkid /dev/vda in a cosa run shell)

This to mean signals that the real problem is likely that we need to synchronously wait for a partprobe.

@jlebon
Copy link
Member

jlebon commented Jul 14, 2022

Do we still need this now that we're using an Ignition config for the reprovisioning in coreos/fedora-coreos-config#1819?

@nikita-dubrovskii
Copy link
Contributor Author

Do we still need this now that we're using an Ignition config for the reprovisioning in coreos/fedora-coreos-config#1819?

i'd prefer to have this. i wasn't able to test ignition+luks on RHCOS, because it again switched to an old kernel (or haven't picked up a fixed one): https://bugzilla.redhat.com/show_bug.cgi?id=2075085 . switching to dev/vda instead of coreos-boot-disks doesn't help much. so i'm still debugging why /dev/disk/by-*/ are partially empty after ignition

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, while I think ensuring we can know the kernel state is correct here would be better, I'm personally fine with this too.

@jlebon
Copy link
Member

jlebon commented Aug 10, 2022

Right, my concern with this is that this feels like it's working around what could possibly be a deeper issue. We're fixing it for rdcore but other code (present and future) may still be using the wrong information. If Secure Execution is triggering this, let's try to find out why that is and fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants