Skip to content

Commit 4a9d9e4

Browse files
committed
manifests/0000_90_kube-controller-manager-operator_05_alerts: Template console links in alert descriptions
Prometheus alerts support Go templating [1], and this commit uses that to provide more context like "which namespace?", "which PodDisruptionBudget?", "where can I find that PDB in the in-cluster web console?", and "what 'oc' command would I run to see garbage-collection sync logs?". This should make understanding the context of the alert more straightforward, with the responder having to dip into labels and guess. [1]: https://prometheus.io/docs/prometheus/latest/configuration/template_reference/
1 parent 75f30cd commit 4a9d9e4

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

Diff for: manifests/0000_90_kube-controller-manager-operator_05_alerts.yaml

+6-5
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,9 @@ spec:
2525
- alert: PodDisruptionBudgetAtLimit
2626
annotations:
2727
summary: The pod disruption budget is preventing further disruption to pods.
28-
description: The pod disruption budget is at the minimum disruptions allowed level. The number of current healthy pods is equal to the desired healthy pods.
29-
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetAtLimit.md
28+
description: The {{ $labels.poddisruptionbudget }} pod disruption budget in the {{ $labels.namespace}} namespace is at the maximum allowed disruption. The number of current healthy pods is equal to the desired healthy pods.{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url))) 0}} For more information refer to {{ label "url" (first $console_url) }}/k8s/ns/{{ $labels.namespace }}/poddisruptionbudgets/{{ $labels.poddisruptionbudget }}{{ end }}{{ end }}
29+
30+
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetAtLimit.md
3031
expr: |
3132
max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy and on (namespace, poddisruptionbudget) kube_poddisruptionbudget_status_expected_pods > 0)
3233
for: 60m
@@ -35,17 +36,17 @@ spec:
3536
- alert: PodDisruptionBudgetLimit
3637
annotations:
3738
summary: The pod disruption budget registers insufficient amount of pods.
38-
description: The pod disruption budget is below the minimum disruptions allowed level and is not satisfied. The number of current healthy pods is less than the desired healthy pods.
39+
description: The {{ $labels.poddisruptionbudget }} pod disruption budget in the {{ $labels.namespace }} namespace exceeds the maximum allowed disruption and is not satisfied. The number of current healthy pods is {{ $value }} less than the desired healthy pods.{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url))) 0}} For more information refer to {{ label "url" (first $console_url) }}/k8s/ns/{{ $labels.namespace }}/poddisruptionbudgets/{{ $labels.poddisruptionbudget }}{{ end }}{{ end }}
3940
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetLimit.md
4041
expr: |
41-
max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy < kube_poddisruptionbudget_status_desired_healthy)
42+
max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_desired_healthy - kube_poddisruptionbudget_status_current_healthy) > 0
4243
for: 15m
4344
labels:
4445
severity: critical
4546
- alert: GarbageCollectorSyncFailed
4647
annotations:
4748
summary: There was a problem with syncing the resources for garbage collection.
48-
description: Garbage Collector had a problem with syncing and monitoring the available resources. Please see KubeControllerManager logs for more details.
49+
description: Garbage Collector had a problem with syncing and monitoring the available resources. Please see KubeControllerManager logs for more details: 'oc -n {{ $labels.namespace }} logs -c {{ $labels.container }} {{ $labels.pod }}'{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url))) 0}} For more information refer to {{ label "url" (first $console_url) }}/k8s/ns/{{ $labels.namespace }}/pods/{{ $labels.pod }}/logs?container={{ $labels.container }} {{ end }}{{ end }}.
4950
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/GarbageCollectorSyncFailed.md
5051
expr: |
5152
rate(garbagecollector_controller_resources_sync_error_total{}[5m]) > 0

0 commit comments

Comments
 (0)