Skip to content

manifests/0000_90_kube-controller-manager-operator_05_alerts: Template console links in alert descriptions #837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wking
Copy link
Member

@wking wking commented Apr 11, 2025

Prometheus alerts support Go templating, and this pull uses that to provide more context like "which namespace?", "which PodDisruptionBudget?", "where can I find that PDB in the in-cluster web console?", and "what oc command would I run to see garbage-collection sync logs?". This should make understanding the context of the alert more straightforward, with the responder having to dip into labels and guess.

@openshift-ci openshift-ci bot requested review from deads2k and ingvagabund April 11, 2025 16:59
Copy link
Contributor

openshift-ci bot commented Apr 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
Once this PR has been reviewed and has the lgtm label, please assign atiratree for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…e console links in alert descriptions

Prometheus alerts support Go templating [1], and this commit uses that
to provide more context like "which namespace?", "which
PodDisruptionBudget?", "where can I find that PDB in the in-cluster
web console?", and "what 'oc' command would I run to see
garbage-collection sync logs?".  This should make understanding the
context of the alert more straightforward, with the responder having
to dip into labels and guess.

Using |- for trimmed, block style strings avoids YAML parsers choking
on the "for more details: ..." colon with "mapping values are not
allowed in this context" and similar.

[1]: https://prometheus.io/docs/prometheus/latest/configuration/template_reference/
@wking wking force-pushed the template-alert-descriptions branch from 4a9d9e4 to daae216 Compare April 11, 2025 19:54
@@ -25,7 +25,8 @@ spec:
- alert: PodDisruptionBudgetAtLimit
annotations:
summary: The pod disruption budget is preventing further disruption to pods.
description: The pod disruption budget is at the minimum disruptions allowed level. The number of current healthy pods is equal to the desired healthy pods.
description: |-
The {{ $labels.poddisruptionbudget }} pod disruption budget in the {{ $labels.namespace}} namespace is at the maximum allowed disruption. The number of current healthy pods is equal to the desired healthy pods.{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url))) 0}} For more information refer to {{ label "url" (first $console_url) }}/k8s/ns/{{ $labels.namespace }}/poddisruptionbudgets/{{ $labels.poddisruptionbudget }}{{ end }}{{ end }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure what to do with the unit failure:

: TestYamlCorrectness expand_less	0s
{=== RUN   TestYamlCorrectness
    assets_test.go:2 ...  === RUN   TestYamlCorrectness
    assets_test.go:2 ...}

the test-case's stdout includes:

=== RUN   TestYamlCorrectness
    assets_test.go:20: Unexpected error reading manifests from ../../manifests/: failed to render "0000_90_kube-controller-manager-operator_05_alerts.yaml": template: 0000_90_kube-controller-manager-operator_05_alerts.yaml:29: undefined variable "$labels"

I guess the that's this assets.New call through assetFromTemplate through renderFile to this template.New. I'm not clear on why this operator feels like these manifests should be Go templates. Maybe we can pivot to using ManifestsFromFiles.

Copy link
Contributor

openshift-ci bot commented Apr 11, 2025

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit daae216 link true /test unit
ci/prow/okd-scos-e2e-aws-ovn daae216 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant