Skip to content

Commit 1964784

Browse files
committed
Add style guide for alert naming, labels, annotations
Signed-off-by: beorn7 <[email protected]>
1 parent 5c39d48 commit 1964784

File tree

1 file changed

+54
-0
lines changed

1 file changed

+54
-0
lines changed

Diff for: README.md

+54
Original file line numberDiff line numberDiff line change
@@ -151,3 +151,57 @@ $ jsonnet -J vendor -S -e 'std.manifestYamlDoc((import "mixin.libsonnet").promet
151151
$ jsonnet -J vendor -S -e 'std.manifestYamlDoc((import "mixin.libsonnet").prometheusRules)' >files/rules.yml
152152
$ jsonnet -J vendor -m files/dashboards -e '(import "mixin.libsonnet").grafanaDashboards'
153153
```
154+
## Guidelines for alert names, labels, and annotations
155+
156+
Prometheus alerts deliberately allow users to define their own schema for
157+
names, labels, and annotations. The following is a style guide recommended for
158+
alerts in monitoring mixins. Following this guide helps creating useful
159+
notification templates for all mixins and customizing mixin alerts in a unified
160+
fashion.
161+
162+
The alert **name** is a terse description of the alerting condition, using
163+
camel case, without whitespace, starting with a capital letter. The first
164+
component of the name should be shared between all alerts of a mixin (or
165+
between a group of related alerts within a larger mixin). Examples:
166+
`NodeFilesystemAlmostOutOfFiles` (from the [node-exporter
167+
mixin](https://github.com/prometheus/node_exporter/tree/master/docs/node-mixin),
168+
`PrometheusNotificationQueueRunningFull` (from the [Prometheus
169+
mixin](https://github.com/prometheus/prometheus/blob/master/documentation/prometheus-mixin)).
170+
171+
To mark the severity of an alert, use a **label** called `severity` with one of
172+
the following label values:
173+
- `critical` for alerts that require immediate action. For a production system,
174+
those alerts will usually hit a pager.
175+
- `warning` for alerts that require action eventually but not urgently enough
176+
to wake someone up or require them to immediately interrupt what they are
177+
working on. A typical routing target for those alerts is some kind of ticket
178+
queueing or bug tracking system.
179+
- `info` for alerts that do not require any action by itself but mark something
180+
as “out of the ordinary”. Those alerts aren't usually routed anywhere, but
181+
can be inspected during troubleshooting.
182+
183+
An alert can have the following **annotations**:
184+
- `summary` (mandatory): Essentially a more comprehensive and readable version
185+
of the alert name. Use a human-readable sentence, starting with a capital
186+
letter and ending with a period. Use a static string or, if dynamic expansion
187+
is needed, aim for expanding into the same string for alerts that are
188+
typically grouped together into one notification. In that way, it can be used
189+
as a common “headline” for all alerts in the notification template. Examples:
190+
`Filesystem has less than 3% inodes left.` (for the
191+
`NodeFilesystemAlmostOutOfFiles` alert mentioned above), `Prometheus alert
192+
notification queue predicted to run full in less than 30m.` (for the
193+
`PrometheusNotificationQueueRunningFull` alert mentioned above).
194+
- `description` (mandatory): A detailed description of a single alert, with
195+
most of the important information templated in. The description usually
196+
expands into a different string for every individual alert within a
197+
notification. A notification template can iterate through all the
198+
descriptions and format them into a list. Examples (again corresponding to
199+
the examples above): `Filesystem on {{ $labels.device }} at {{
200+
$labels.instance }} has only {{ printf "%.2f" $value }}% available inodes
201+
left.`, `Alert notification queue of Prometheus %(prometheusName)s is running
202+
full.`.
203+
204+
Note that we plan to add recommended optional annotations for a runbook link
205+
(presumably called `runbook_url`) and a dashboard link
206+
(`dashboard_url`). However, we still need to work out how to configure patterns
207+
for those URLs across mixins in a useful way.

0 commit comments

Comments
 (0)