Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFBUGS-319: Fix rdspec and protectedpvcs condition #387

Open
wants to merge 6 commits into
base: release-4.17
Choose a base branch
from

Conversation

BenamarMk
Copy link
Collaborator

@BenamarMk BenamarMk commented Nov 4, 2024

This PR includes critical fixes for Cephfs workloads that occasionally caused the relocation to stall forever in the WaitForReadiness progression.

Key Changes:

  1. Fix for RDSpec List Alternation
    Addressed an issue where frequent VRG resource updates caused the RDSpec list to alternate between empty and non-empty list. This inconsistency was leading to incomplete PVC restores during failover or relocation, halting the recovery process.

  2. Fix for ProtectedPVC PVsRestored Condition
    In certain edge cases, ProtectedPVCs were failing to add the PVsRestored condition permanently, which caused the relocate process to get stuck in the WaitForReadiness progression. This fix ensures the condition is consistently applied, preventing the relocation from stalling.

  3. Refactor of ManifestWork Creation Function
    The utility function that creates ManifestWork has been refactored to return the last operation result (created, updated, or none) alongside any errors. This change allows tracking of whether a ManifestWork resource was newly created, updated, or left unchanged.

JIRA: https://issues.redhat.com//browse/DFBUGS-319

Benamar Mekhissi added 6 commits November 4, 2024 07:36
Fix an issue where the VRG resource was frequently updated, causing the RDSpec
to alternate between an empty and non-empty list. This behavior directly impacted
failover and relocation. If the list was empty during these actions, PVC restore
was skipped, leading to incomplete recovery.

Signed-off-by: Benamar Mekhissi <[email protected]>
(cherry picked from commit a974756)
This commit modifies the utility function that creates the ManifestWork to return
an additional value indicating the last operation result alongside the error. The
result can be one of three values: created, updated, or none. This change is
needed to track whether the ManifestWork resource was newly created, updated, or
left unchanged.

Signed-off-by: Benamar Mekhissi <[email protected]>
(cherry picked from commit c46cc59)
Signed-off-by: Benamar Mekhissi <[email protected]>
(cherry picked from commit fcf6be9)
In certain edge cases, ProtectedPVCs may fail to add the PVsRestored condition
permanently, causing the relocate process to get stuck in the WaitForReadiness
progression.

Signed-off-by: Benamar Mekhissi <[email protected]>
(cherry picked from commit d7f0b8f)
Signed-off-by: Benamar Mekhissi <[email protected]>
(cherry picked from commit aae3695)
When ensuring the VRG ManifestWork, the process now begins by retrieving the VRG
from an existing ManifestWork, if available, and updating it as needed. If the
ManifestWork does not exist, it will be created. This update-instead-of-create
approach avoids overwriting other fields unintentionally and ensures consistency
by always starting from a base VRG state.

Signed-off-by: Benamar Mekhissi <[email protected]>
(cherry picked from commit b05e435)
Copy link

openshift-ci bot commented Nov 4, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BenamarMk

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: This pull request references Bugzilla bug 2319334, which is invalid:

  • expected the bug to target the "ODF 4.17.0" release, but it targets "ODF 4.18.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2319334: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

1 similar comment
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: This pull request references Bugzilla bug 2319334, which is invalid:

  • expected the bug to target the "ODF 4.17.0" release, but it targets "ODF 4.18.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2319334: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@BenamarMk BenamarMk changed the title Bug 2319334: Fix rdspec and protectedpvcs condition Bug 2321510: Fix rdspec and protectedpvcs condition Nov 4, 2024
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: An error was encountered updating to the POST state for bug 2321510 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. code 109: You are not permitted to edit bugs in product Red Hat OpenShift Data Foundation.

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

Bug 2321510: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@BenamarMk BenamarMk changed the title Bug 2321510: Fix rdspec and protectedpvcs condition DFBug 319: Fix rdspec and protectedpvcs condition Nov 4, 2024
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: This pull request references Bugzilla bug 319, which is invalid:

  • expected the bug to be open, but it isn't
  • expected the bug to target the "ODF 4.17.0" release, but it targets "---" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is CLOSED (NOTABUG) instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

DFBug 319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@BenamarMk BenamarMk changed the title DFBug 319: Fix rdspec and protectedpvcs condition DFBUGS-319: Fix rdspec and protectedpvcs condition Nov 4, 2024
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

DFBUGS-319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 4, 2024

@BenamarMk: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

This PR includes critical fixes for Cephfs workloads that occasionally caused the relocation to stall forever in the WaitForReadiness progression.

Key Changes:

  1. Fix for RDSpec List Alternation
    Addressed an issue where frequent VRG resource updates caused the RDSpec list to alternate between empty and non-empty list. This inconsistency was leading to incomplete PVC restores during failover or relocation, halting the recovery process.

  2. Fix for ProtectedPVC PVsRestored Condition
    In certain edge cases, ProtectedPVCs were failing to add the PVsRestored condition permanently, which caused the relocate process to get stuck in the WaitForReadiness progression. This fix ensures the condition is consistently applied, preventing the relocation from stalling.

  3. Refactor of ManifestWork Creation Function
    The utility function that creates ManifestWork has been refactored to return the last operation result (created, updated, or none) alongside any errors. This change allows tracking of whether a ManifestWork resource was newly created, updated, or left unchanged.

Fixes Bug: 2319334

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@BenamarMk BenamarMk changed the title DFBUGS-319: Fix rdspec and protectedpvcs condition DFBUGS 319: Fix rdspec and protectedpvcs condition Nov 4, 2024
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

DFBUGS 319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

@BenamarMk: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

In response to this:

This PR includes critical fixes for Cephfs workloads that occasionally caused the relocation to stall forever in the WaitForReadiness progression.

Key Changes:

  1. Fix for RDSpec List Alternation
    Addressed an issue where frequent VRG resource updates caused the RDSpec list to alternate between empty and non-empty list. This inconsistency was leading to incomplete PVC restores during failover or relocation, halting the recovery process.

  2. Fix for ProtectedPVC PVsRestored Condition
    In certain edge cases, ProtectedPVCs were failing to add the PVsRestored condition permanently, which caused the relocate process to get stuck in the WaitForReadiness progression. This fix ensures the condition is consistently applied, preventing the relocation from stalling.

  3. Refactor of ManifestWork Creation Function
    The utility function that creates ManifestWork has been refactored to return the last operation result (created, updated, or none) alongside any errors. This change allows tracking of whether a ManifestWork resource was newly created, updated, or left unchanged.

Fixes Bug: 2319334

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@BenamarMk BenamarMk changed the title DFBUGS 319: Fix rdspec and protectedpvcs condition DFBUGS-319: Fix rdspec and protectedpvcs condition Nov 4, 2024
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

DFBUGS-319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 4, 2024

@BenamarMk: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

This PR includes critical fixes for Cephfs workloads that occasionally caused the relocation to stall forever in the WaitForReadiness progression.

Key Changes:

  1. Fix for RDSpec List Alternation
    Addressed an issue where frequent VRG resource updates caused the RDSpec list to alternate between empty and non-empty list. This inconsistency was leading to incomplete PVC restores during failover or relocation, halting the recovery process.

  2. Fix for ProtectedPVC PVsRestored Condition
    In certain edge cases, ProtectedPVCs were failing to add the PVsRestored condition permanently, which caused the relocate process to get stuck in the WaitForReadiness progression. This fix ensures the condition is consistently applied, preventing the relocation from stalling.

  3. Refactor of ManifestWork Creation Function
    The utility function that creates ManifestWork has been refactored to return the last operation result (created, updated, or none) alongside any errors. This change allows tracking of whether a ManifestWork resource was newly created, updated, or left unchanged.

Fixes Bug: 2319334

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@BenamarMk BenamarMk changed the title DFBUGS-319: Fix rdspec and protectedpvcs condition JIRA: https://issues.redhat.com//browse/DFBUGS-319: Fix rdspec and protectedpvcs condition Nov 4, 2024
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

JIRA: https://issues.redhat.com//browse/DFBUGS-319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@BenamarMk BenamarMk changed the title JIRA: https://issues.redhat.com//browse/DFBUGS-319: Fix rdspec and protectedpvcs condition DFBUGS-319: Fix rdspec and protectedpvcs condition Nov 4, 2024
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

DFBUGS-319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

1 similar comment
Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

DFBUGS-319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 4, 2024

@BenamarMk: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

This PR includes critical fixes for Cephfs workloads that occasionally caused the relocation to stall forever in the WaitForReadiness progression.

Key Changes:

  1. Fix for RDSpec List Alternation
    Addressed an issue where frequent VRG resource updates caused the RDSpec list to alternate between empty and non-empty list. This inconsistency was leading to incomplete PVC restores during failover or relocation, halting the recovery process.

  2. Fix for ProtectedPVC PVsRestored Condition
    In certain edge cases, ProtectedPVCs were failing to add the PVsRestored condition permanently, which caused the relocate process to get stuck in the WaitForReadiness progression. This fix ensures the condition is consistently applied, preventing the relocation from stalling.

  3. Refactor of ManifestWork Creation Function
    The utility function that creates ManifestWork has been refactored to return the last operation result (created, updated, or none) alongside any errors. This change allows tracking of whether a ManifestWork resource was newly created, updated, or left unchanged.

JIRA: https://issues.redhat.com//browse/DFBUGS-319

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

DFBUGS-319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 4, 2024

@BenamarMk: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

JIRA: https://issues.redhat.com//browse/DFBUGS-319

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

openshift-ci bot commented Nov 4, 2024

@BenamarMk: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

DFBUGS-319: Fix rdspec and protectedpvcs condition

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 4, 2024

@BenamarMk: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

This PR includes critical fixes for Cephfs workloads that occasionally caused the relocation to stall forever in the WaitForReadiness progression.

Key Changes:

  1. Fix for RDSpec List Alternation
    Addressed an issue where frequent VRG resource updates caused the RDSpec list to alternate between empty and non-empty list. This inconsistency was leading to incomplete PVC restores during failover or relocation, halting the recovery process.

  2. Fix for ProtectedPVC PVsRestored Condition
    In certain edge cases, ProtectedPVCs were failing to add the PVsRestored condition permanently, which caused the relocate process to get stuck in the WaitForReadiness progression. This fix ensures the condition is consistently applied, preventing the relocation from stalling.

  3. Refactor of ManifestWork Creation Function
    The utility function that creates ManifestWork has been refactored to return the last operation result (created, updated, or none) alongside any errors. This change allows tracking of whether a ManifestWork resource was newly created, updated, or left unchanged.

JIRA: https://issues.redhat.com//browse/DFBUGS-319

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@kseegerrh
Copy link

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 5, 2024

@kseegerrh: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@kseegerrh
Copy link

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 5, 2024

@kseegerrh: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@BenamarMk
Copy link
Collaborator Author

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 8, 2024

@BenamarMk: This pull request references [Jira Issue DFBUGS-319](https://issues.redhat.com//browse/DFBUGS-319), which is invalid:

  • expected the bug to target the "odf-4.17" version, but no target version was set
  • expected dependent [Jira Issue DFBUGS-309](https://issues.redhat.com//browse/DFBUGS-309) to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is POST instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants