[release-4.18] OCPBUGS-49687: Add Readiness Probe to Router Status Tests #29513

openshift-cherrypick-robot · 2025-01-31T01:41:35Z

This is an automated cherry-pick of #29395

/assign gcs278

Previously, the router was configured without a readiness probe, resulting in racy startup conditions during router status stress tests. Routers would be marked as ready immediately upon starting, causing the waitForReadyReplicaSet function to proceed prematurely. This allowed the next step of route creation to occur before the routers had fully initialized. This often led to the first two routers to fight over the route status while the third router was still starting. As a result, the third router missed observing these early status contentions, leading to more writes to the route status than we were expecting. Adding the readiness probe also revealed that HAProxy was failing to start due to insufficient permissions. The anyuid SCC was added to the router's service account to resolve the issue.

openshift-ci-robot · 2025-01-31T01:41:45Z

@openshift-cherrypick-robot: Jira Issue OCPBUGS-44238 has been cloned as Jira Issue OCPBUGS-49687. Will retitle bug to link to clone.
/retitle [release-4.18] OCPBUGS-49687: Add Readiness Probe to Router Status Tests

In response to this:

This is an automated cherry-pick of #29395

/assign gcs278

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-01-31T01:42:20Z

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-49687, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.18.0) matches configured target version for branch (4.18.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
release note text is set and does not match the template
dependent bug Jira Issue OCPBUGS-44238 is in the state ON_QA, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
dependent Jira Issue OCPBUGS-44238 targets the "4.19.0" version, which is one of the valid target versions: 4.19.0
bug has dependents

Requesting review from QA contact:
/cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #29395

/assign gcs278

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Miciah · 2025-01-31T02:32:47Z

This change only modifies test code, so it is low risk. Furthermore, the test is currently erroneously making component readiness red.

/label backport-risk-assessed

openshift-ci · 2025-01-31T02:32:57Z

@Miciah: Can not set label backport-risk-assessed: Must be member in one of these teams: [openshift-staff-engineers]

In response to this:

This change only modifies test code, so it is low risk. Furthermore, the test is currently erroneously making component readiness red.

/label backport-risk-assessed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Miciah · 2025-01-31T02:35:27Z

Clean cherry-pick.
/approve
/lgtm

openshift-ci · 2025-01-31T02:35:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/extended/router/OWNERS~~ [Miciah]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gcs278 · 2025-01-31T13:41:20Z

It's failing E2E with:

{  fail [github.com/openshift/origin/test/extended/router/stress.go:292]: Unexpected error:
    <*errors.errorString | 0xc0017d3450>: 
    replicaset "router-add-condition" never became ready
    {
        s: "replicaset \"router-add-condition\" never became ready",
    }
occurred

I'll try to figure out why it worked in 4.19, but not in 4.18:
/hold

gcs278 · 2025-01-31T14:43:11Z

Ah this is my own fault. I depended the test update on the default cert getting updated to SHA256: openshift/router#646. So that needs to get backported too in order for this to merged, otherwise you get this.

[ALERT]    (12) : config : parsing [/var/lib/haproxy/conf/haproxy.config:129] : 'bind unix@/var/lib/haproxy/run/haproxy-sni.sock' in section 'frontend' : unable to load SSL certificate into SSL Context '/var/lib/haproxy/conf/default_pub_keys.pem': ca md too weak.

I think it's easier (and cleaner) to just backport openshift/router#646 instead of altering this cherry-pick to specify a SHA256 default cert explicitly.

gcs278 · 2025-01-31T14:50:49Z

Depends on openshift/router#648 to be merged

gcs278 · 2025-01-31T15:00:06Z

/hold cancel

gcs278 · 2025-01-31T15:00:36Z

Wrong PR 🫤

/hold

gcs278 · 2025-01-31T17:18:30Z

releasing the hold as openshift/router#648 is expected to merge soon.

/hold cancel

openshift-ci-robot · 2025-02-07T23:57:22Z

/retest-required

Remaining retests: 0 against base HEAD 0c7bed9 and 2 for PR HEAD f2eadcb in total

neisw · 2025-02-10T13:45:34Z

/retest-required

jluhrsen · 2025-02-10T18:36:08Z

/skip

neisw · 2025-02-10T18:47:45Z

Seems like an awful lot of disruption in aws-ovn-edge-zones

Though I see the same outside of this pr #29427

jluhrsen · 2025-02-10T18:53:20Z

Seems like an awful lot of disruption in aws-ovn-edge-zones

Though I see the same outside of this pr #29427

was just about to comment the same thing. dug through the last 8 or so of these jobs and some have this massive disruption and some have none. but to the point, some jobs with the disruption are on different PRs so guessing it's not related to this PR at least.

openshift-ci-robot · 2025-02-10T20:00:04Z

/retest-required

Remaining retests: 0 against base HEAD 0c7bed9 and 2 for PR HEAD f2eadcb in total

openshift-ci-robot · 2025-02-10T22:38:28Z

/retest-required

Remaining retests: 0 against base HEAD 0c7bed9 and 2 for PR HEAD f2eadcb in total

candita · 2025-02-11T00:42:23Z

/retest-required

candita · 2025-02-11T17:35:43Z

The e2e-aws-ovn-edge-zones test is perma-failing due to an issue with metrics-api-new-connections service, which never comes up.

time="2025-02-11T02:19:58Z" level=error msg="disruption sample failed: error running request: 503 Service Unavailable: error trying to reach service: context deadline exceeded\n" auditID=ca08b173-35cd-4da3-9430-05d99bc7741a backend=metrics-api-new-connections this-instance="{Disruption map[backend-disruption-name:metrics-api-new-connections connection:new disruption:openshift-tests]}" type=new
I0211 02:19:58.471851 313 disruption_backend_sampler.go:654] reason/DisruptionBegan request-audit-id/ca08b173-35cd-4da3-9430-05d99bc7741a backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests stopped responding to GET requests over new connections: error running request: 503 Service Unavailable: error trying to reach service: context deadline exceeded

neisw · 2025-02-11T18:22:23Z

It appears to have passed on 2/1 but not since. I don't see other PRs hitting this when reviewing the history Most recent pass outside this pr looks like 2/4

openshift-ci-robot · 2025-02-11T18:34:41Z

/retest-required

Remaining retests: 0 against base HEAD 0c7bed9 and 2 for PR HEAD f2eadcb in total

jluhrsen · 2025-02-11T18:43:36Z

and it wasn't a problem in the 4.19 version of this PR, FWIW.

neisw · 2025-02-11T19:04:09Z

I opened a noop 4.18 pr and kicked off pull-ci-openshift-origin-release-4.18-e2e-aws-ovn-edge-zones, curious to see what the results are.

openshift-ci-robot · 2025-02-11T22:03:31Z

/retest-required

Remaining retests: 0 against base HEAD 0c7bed9 and 2 for PR HEAD f2eadcb in total

candita · 2025-02-11T22:54:17Z

@Miciah The test of e2e-aws-ovn-edge-zones in #29537 succeeded, so maybe now it will succeed here. Or, is it possible we need to wait for openshift/router#648 to be present in a new router build?

openshift-ci · 2025-02-12T00:43:07Z

@openshift-cherrypick-robot: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2025-02-12T00:46:17Z

@openshift-cherrypick-robot: Jira Issue OCPBUGS-49687: All pull requests linked via external trackers have merged:

openshift/origin#29513

Jira Issue OCPBUGS-49687 has been moved to the MODIFIED state.

In response to this:

This is an automated cherry-pick of #29395

/assign gcs278

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-cherrypick-robot assigned gcs278 Jan 31, 2025

openshift-cherrypick-robot mentioned this pull request Jan 31, 2025

OCPBUGS-44238: Add Readiness Probe to Router Status Tests #29395

Merged

openshift-ci bot changed the title ~~[release-4.18] OCPBUGS-44238: Add Readiness Probe to Router Status Tests~~ [release-4.18] OCPBUGS-49687: Add Readiness Probe to Router Status Tests Jan 31, 2025

openshift-ci bot requested review from gcs278 and miheer January 31, 2025 01:42

openshift-ci bot requested a review from lihongan January 31, 2025 01:42

openshift-ci bot assigned Miciah Jan 31, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 31, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2025

gcs278 mentioned this pull request Jan 31, 2025

OCPBUGS-47761: Update default_pub_keys.pem to use SHA256 openshift/router#646

Merged

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2025

jupierce added backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. labels Feb 7, 2025

openshift-ci bot assigned adambkaplan, dgoodwin, gangwgr, jadhaj, kasturinarra, Moebasim, pamoedom and prabhapa Feb 7, 2025

jupierce added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Feb 7, 2025

jupierce added the staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead). label Feb 10, 2025

openshift-merge-bot bot merged commit f89d72e into openshift:release-4.18 Feb 12, 2025
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-4.18] OCPBUGS-49687: Add Readiness Probe to Router Status Tests #29513

[release-4.18] OCPBUGS-49687: Add Readiness Probe to Router Status Tests #29513

openshift-cherrypick-robot commented Jan 31, 2025

openshift-ci-robot commented Jan 31, 2025

openshift-ci-robot commented Jan 31, 2025

Miciah commented Jan 31, 2025

openshift-ci bot commented Jan 31, 2025

Miciah commented Jan 31, 2025

openshift-ci bot commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

openshift-ci-robot commented Feb 7, 2025

neisw commented Feb 10, 2025

jluhrsen commented Feb 10, 2025

neisw commented Feb 10, 2025 •

edited

Loading

jluhrsen commented Feb 10, 2025

openshift-ci-robot commented Feb 10, 2025

openshift-ci-robot commented Feb 10, 2025

candita commented Feb 11, 2025

candita commented Feb 11, 2025

neisw commented Feb 11, 2025

openshift-ci-robot commented Feb 11, 2025

jluhrsen commented Feb 11, 2025

neisw commented Feb 11, 2025

openshift-ci-robot commented Feb 11, 2025

candita commented Feb 11, 2025 •

edited

Loading

openshift-ci bot commented Feb 12, 2025

openshift-ci-robot commented Feb 12, 2025

[release-4.18] OCPBUGS-49687: Add Readiness Probe to Router Status Tests #29513

[release-4.18] OCPBUGS-49687: Add Readiness Probe to Router Status Tests #29513

Conversation

openshift-cherrypick-robot commented Jan 31, 2025

openshift-ci-robot commented Jan 31, 2025

openshift-ci-robot commented Jan 31, 2025

Miciah commented Jan 31, 2025

openshift-ci bot commented Jan 31, 2025

Miciah commented Jan 31, 2025

openshift-ci bot commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

gcs278 commented Jan 31, 2025

openshift-ci-robot commented Feb 7, 2025

neisw commented Feb 10, 2025

jluhrsen commented Feb 10, 2025

neisw commented Feb 10, 2025 • edited Loading

jluhrsen commented Feb 10, 2025

openshift-ci-robot commented Feb 10, 2025

openshift-ci-robot commented Feb 10, 2025

candita commented Feb 11, 2025

candita commented Feb 11, 2025

neisw commented Feb 11, 2025

openshift-ci-robot commented Feb 11, 2025

jluhrsen commented Feb 11, 2025

neisw commented Feb 11, 2025

openshift-ci-robot commented Feb 11, 2025

candita commented Feb 11, 2025 • edited Loading

openshift-ci bot commented Feb 12, 2025

openshift-ci-robot commented Feb 12, 2025

neisw commented Feb 10, 2025 •

edited

Loading

candita commented Feb 11, 2025 •

edited

Loading