Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.17](backport #42714) [metricbeat] Refactor kubernetes bearer token authentication #42784

Merged
merged 2 commits into from
Feb 20, 2025

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Feb 19, 2025

Proposed commit message

[metricbeat] Refactor kubernetes bearer token authentication

Instead of doing retries on 401 errors, use a mechanism from client-go which simply reloads the token periodically in the background.

Also, don't stop logging errors after the first 401. These errors, if present, need to be addressed by the cluster operator, so we should make them more prominent.

We have a report of the current mechanism running into race conditions in some OpenShift clusters. The exact root cause is unknown, but this change should address it.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

After this change, we will continue logging errors when we get a 401 from the API Server of kubelet, whereas up until now we'd only log the first one.

How to test this PR locally

  1. Build a local metricbeat docker image using mage package.
  2. Start a kind cluster.
  3. Upload the docker image to the kind cluster.
  4. Install metricbeat in the cluster using the official manifests.
  5. Wait for an hour until the auth token gets rotated.
  6. Look at records coming from the kubernetes module for the non-state metricsets in Kibana:

Screenshot_20250217_120534

Related issues


This is an automatic backport of pull request #42714 done by [Mergify](https://mergify.com).

Instead of doing retries on 401 errors, use a mechanism from client-go
which simply reloads the token periodically in the background.

Also, don't stop logging errors after the first 401. These errors, if
present, need to be addressed by the cluster operator, so we should make
them more prominent.

(cherry picked from commit c61c0fe)
@mergify mergify bot requested a review from a team as a code owner February 19, 2025 13:09
@mergify mergify bot added the backport label Feb 19, 2025
@mergify mergify bot requested a review from a team as a code owner February 19, 2025 13:09
@mergify mergify bot removed the request for review from a team February 19, 2025 13:09
@mergify mergify bot requested review from leehinman and khushijain21 February 19, 2025 13:09
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 19, 2025
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Feb 19, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 19, 2025
@pierrehilbert pierrehilbert added Team:obs-ds-hosted-services Label for the Observability Hosted Services team needs_team Indicates that the issue/PR needs a Team:* label labels Feb 19, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 19, 2025
@pierrehilbert pierrehilbert requested review from swiatekm and removed request for leehinman and khushijain21 February 19, 2025 14:34
@swiatekm swiatekm enabled auto-merge (squash) February 19, 2025 18:15
@swiatekm swiatekm merged commit 9a53e3e into 8.17 Feb 20, 2025
39 checks passed
@swiatekm swiatekm deleted the mergify/bp/8.17/pr-42714 branch February 20, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants