Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework grub2 default static config #841

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

champtar
Copy link
Contributor

@champtar champtar commented Feb 7, 2025

Rework grub2 default static config

  • Add support for GRUB2_PASSWORD in user.cfg (partially fix https://issues.redhat.com/browse/RHEL-78299)
  • include dropins instead of sourcing them, this should help updating config in the future
  • add multiple small dropins to match classic grub2 installs

Copy link

openshift-ci bot commented Feb 7, 2025

Hi @champtar. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@champtar champtar force-pushed the grub2-password branch 2 times, most recently from 25c7a37 to 8710963 Compare February 8, 2025 02:35
@champtar champtar changed the title Add GRUB2_PASSWORD support, source custom.cfg Rework grub default static config Feb 8, 2025
@champtar champtar changed the title Rework grub default static config Rework grub2 default static config Feb 8, 2025
@champtar champtar force-pushed the grub2-password branch 3 times, most recently from faddad9 to 74ff55c Compare February 8, 2025 04:21
@champtar
Copy link
Contributor Author

Can someone with CI / FCOS knowledge chime in ? I think 10_blscfg and other configs.d are not copied in the continuous-integration/jenkins/pr-merge test, is this expected ? ie does FCOS overwrite the full folder, or we are just testing the binary and not the full rpm ?
c9s CI works ok / I can see the new files being used.

@champtar
Copy link
Contributor Author

@cgwalters any opinions on this PR ?

@champtar
Copy link
Contributor Author

Anyone ?

@champtar
Copy link
Contributor Author

@HuijingHei maybe ?

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry for the delay on review here! I really value your contributions overall. You've chosen to work on some thorny problems, which is good but for this particular code we're highly subject to regressions.

Overall though, I see no issues in this so far, and probably what we need to do is just try to land it and maybe put it in Rawhide and soak for a bit.

Actually after landing here in git main it will appear in the bootc base-images-dev and we have CI coverage of that.

@cgwalters
Copy link
Member

/ok-to-test

@cgwalters
Copy link
Member

That all said it does look quite likely related that the upgrade test failed here https://jenkins-coreos-ci.apps.ocp.fedoraproject.org/job/bootupd/job/PR-841/6/artifact/tmp/console.txt

@champtar
Copy link
Contributor Author

That all said it does look quite likely related that the upgrade test failed here https://jenkins-coreos-ci.apps.ocp.fedoraproject.org/job/bootupd/job/PR-841/6/artifact/tmp/console.txt

See my previous comment #841 (comment), I bet the failing test only copy the new binary and not the new configs.d

@champtar
Copy link
Contributor Author

My ci fix is ignored, should I move it to another PR that we merge first, then rebase on top ?

@cgwalters
Copy link
Member

My ci fix is ignored, should I move it to another PR that we merge first, then rebase on top ?

ci: also install grub config and systemd unit looks correct to me and would indeed make sense to aim to land as a distinct PR.

@champtar
Copy link
Contributor Author

#879 need to be merged first then I can rebase on top

user.cfg, despite his name, is usually used to store GRUB2_PASSWORD variable:
- grub2-set-password utility overwrite the whole file
- security scanners look at the content of user.cfg
  https://github.com/ComplianceAsCode/content/blob/47fd3bcded59116ade8ea09eb396f363e37813d4/linux_os/guide/system/bootloader-grub2/uefi/grub2_uefi_password/oval/shared.xml

Copy the content of the legacy /etc/grub.d/01_users as 01_grub2_password.cfg,
and source custom.cfg instead of user.cfg for people in need of custom
configs. This gets us closer to classic grub2-mkconfig behaviour.
Both features were added 13 years ago.

feature_menuentry_id was added in
rhboot/grub2@d9bef9b

feature_all_video_module was added in
rhboot/grub2@22c7ce8
In classic install timeout setting is at the end of 00_header,
being at the end of grub-static-pre.cfg is equivalent.

This allow to overide the timeout setting using configs.d.

While at it remove the feature_timeout_style check as the
feature was added 12 years ago in
rhboot/grub2@8f236c1
This allows to add menu entries after the BLS entries,
for exemple 'UEFI Firmware Settings'.
This will make updating config easier as there will
be no need to cleanup dropins files in /boot/grub2/.
@HuijingHei
Copy link
Member

Need to fix CI first.

@champtar
Copy link
Contributor Author

Need to fix CI first.

In console.txt boot seems ok (https://jenkins-coreos-ci.apps.ocp.fedoraproject.org/job/bootupd/job/PR-841/8/artifact/tmp/console.txt), but kola qemuexec timeout (https://jenkins-coreos-ci.apps.ocp.fedoraproject.org/blue/rest/organizations/jenkins/pipelines/bootupd/branches/PR-841/runs/8/log/?start=0)
comparing with a sucessfull boot, ignition.firstboot kargs is missing, but ignition.platform.id=qemu is present

@champtar
Copy link
Contributor Author

Ok I'm actually breaking ignition because 40_coreos-ignition.cfg is now after blscfg

[2025-03-17T02:47:25.147Z] Added 01_grub2_password.cfg
[2025-03-17T02:47:25.147Z] Added 10_blscfg.cfg
[2025-03-17T02:47:25.147Z] Added 14_menu_show_once.cfg
[2025-03-17T02:47:25.147Z] Added 30_uefi-firmware.cfg
[2025-03-17T02:47:25.147Z] Added 40_coreos-ignition.cfg
[2025-03-17T02:47:25.147Z] Added 41_custom.cfg
[2025-03-17T02:47:25.147Z] Added 70_coreos-user.cfg

@champtar
Copy link
Contributor Author

Reordered the files a bit to see if it fixes, if yes I'll need to change ignition.cfg to 40_ignition.cfg
BTW 70_coreos-user.cfg is not needed for a long time I think

@champtar
Copy link
Contributor Author

Jenkins seems to be broken right now

@champtar
Copy link
Contributor Author

Opened a PR to remove 70_coreos-user.cfg as it duplicates 01_grub2_password.cfg: coreos/fedora-coreos-config#3399

@champtar
Copy link
Contributor Author

And another PR for ignition: coreos/ignition#2037

@champtar
Copy link
Contributor Author

CI is green now BTW

@champtar
Copy link
Contributor Author

Anyone aware of a repo with another copy of ignition.cfg ? (Something similar to fedora-coreos-config)
Should we just stop if file doesn't start with a number to make it easier to debug ? Or just add a warning ?

@champtar
Copy link
Contributor Author

Found https://github.com/openshift/os/blob/master/overlay.d/40grub but it just uses fedora-coreos-config

@HuijingHei
Copy link
Member

Anyone aware of a repo with another copy of ignition.cfg ? (Something similar to fedora-coreos-config) Should we just stop if file doesn't start with a number to make it easier to debug ? Or just add a warning ?

I am OK with a warning and skip, but after Rename ignition.cfg -> 40_ignition.cfg, will it be duplicated with 40_coreos-ignition.cfg and have 2 same part in the new grub.cfg?

@champtar
Copy link
Contributor Author

Anyone aware of a repo with another copy of ignition.cfg ? (Something similar to fedora-coreos-config) Should we just stop if file doesn't start with a number to make it easier to debug ? Or just add a warning ?

I am OK with a warning and skip, but after Rename ignition.cfg -> 40_ignition.cfg, will it be duplicated with 40_coreos-ignition.cfg and have 2 same part in the new grub.cfg?

ignition.cfg right now is in a separate rpm named ignition-ignition-grub, but yes we should use this package in coreos (once fixed) and remove 40_coreos-ignition.cfg

@champtar
Copy link
Contributor Author

I am OK with a warning and skip

That would also require to fix greenboot if we skip

@cgwalters
Copy link
Member

/ok-to-test

@champtar
Copy link
Contributor Author

PR for greenboot: fedora-iot/greenboot#214

@champtar
Copy link
Contributor Author

On a legacy grub install we have:

  • 00_header (load_env / ...)
  • 01_users (user.cfg / GRUB2_PASSWORD)
  • 08_fallback_counting
  • 10_linux (blscfg ++)
  • 10_reset_boot_success (goes with 08_fallback_counting)
  • 12_menu_auto_hide (goes with 08_fallback_counting)
  • 14_menu_show_once
  • 20_linux_xen
  • 20_ppc_terminfo
  • 30_os-prober
  • 30_uefi-firmware (must be after blscfg as we want to firmware menu to be last)
  • 35_fwupd (no equivalent)
  • 40_custom
  • 41_custom (source custom.cfg)

Before this PR we have

  • grub-static-pre.cfg (bootuuid.cfg / load_env / console.cfg / menuentry_id_option / all_video)
  • 40_coreos-ignition.cfg
  • greenboot.cfg
  • grub-static-post.cfg (timeout / user.cfg / blscfg)

After all the PRs we should have:

  • grub-static-pre.cfg (bootuuid.cfg / load_env / console.cfg / menuentry_id_option / all_video) + timeout
  • 01_grub2_password.cfg (user.cfg / GRUB2_PASSWORD)
  • 08_greenboot.cfg (== 08_fallback_counting + 10_reset_boot_success)
  • 40_coreos-ignition.cfg or 40_ignition.cfg
  • 50_blscfg.cfg (blscfg / == 10_linux)
  • 60_menu_show_once.cfg (== 14_menu_show_once)
  • 70_uefi-firmware.cfg (== 30_uefi-firmware)
  • 80_custom.cfg (== 41_custom)

So the ordering related to blscfg is preserved (can break ignition and uefi-firmware), user.cfg + timeout are way sooner but this match legacy grub install

How do we want to go forward with this PR ?

@champtar
Copy link
Contributor Author

champtar commented Mar 18, 2025

As no one is using ignition-ignition-grub (coreos/ignition#2037 (comment)), should we just rename to 05_ignition.cfg and same in fedora-coreos-config, so we can keep the numbering the same as in legacy grub ?

@champtar
Copy link
Contributor Author

I went ahead and opened a PR to rename to 05_ignition.cfg: coreos/fedora-coreos-config#3406

@champtar champtar marked this pull request as draft March 19, 2025 13:53
@champtar
Copy link
Contributor Author

(draft until coreos/fedora-coreos-config#3406 lands)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google is failing me a bit. Is there a short description somewhere on the internet of what menu_show_once_timeout does?

Copy link
Contributor Author

@champtar champtar Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with some integration in the OS:

  • /usr/lib/systemd/system/reboot.target.wants/grub2-systemd-integration.service
  • /usr/libexec/grub2/systemd-integration.sh

or manually using grub2-editenv

it allow for a longer timeout before you boot to the default option
When using OOBM like iLO / iDRAC virtual console 1s can be way too short

Copy link
Member

@dustymabe dustymabe Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment here in the file I think would be useful:

Setting a `menu_show_once_timeout` value in grub env vars will show
the menu for a longer timeout on just one boot to allow for extra
time to interactively stop boot and interface with GRUB directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the comment in the legacy version of this should suffice:

# Force the menu to be shown once, with a timeout of ${menu_show_once_timeout}
# if requested by ${menu_show_once_timeout} being set in the env.

@dustymabe
Copy link
Member

As no one is using ignition-ignition-grub (coreos/ignition#2037 (comment)), should we just rename to 05_ignition.cfg and same in fedora-coreos-config, so we can keep the numbering the same as in legacy grub ?

let's assume the rename to 05_ignition.cfg lands.. what will the new order (like what you detailed in #841 (comment)) look like?

@champtar
Copy link
Contributor Author

As no one is using ignition-ignition-grub (coreos/ignition#2037 (comment)), should we just rename to 05_ignition.cfg and same in fedora-coreos-config, so we can keep the numbering the same as in legacy grub ?

let's assume the rename to 05_ignition.cfg lands.. what will the new order (like what you detailed in #841 (comment)) look like?

@dustymabe I would remove the last commit, so that would be:

  • grub-static-pre.cfg (bootuuid.cfg / load_env / console.cfg / menuentry_id_option / all_video) + timeout
  • 01_grub2_password.cfg (user.cfg / GRUB2_PASSWORD)
  • 05_ignition.cfg
  • 08_greenboot.cfg (== 08_fallback_counting + 10_reset_boot_success)
  • 10_blscfg.cfg (blscfg / == 10_linux)
  • 14_menu_show_once.cfg (== 14_menu_show_once)
  • 30_uefi-firmware.cfg (== 30_uefi-firmware)
  • 41_custom.cfg (== 41_custom)

@dustymabe
Copy link
Member

at the top of each of the files should we mention the legacy version of GRUB config they correspond to?

@champtar
Copy link
Contributor Author

Now waiting for coreos/fedora-coreos-config#3407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants