@@ -151,3 +151,57 @@ $ jsonnet -J vendor -S -e 'std.manifestYamlDoc((import "mixin.libsonnet").promet
151
151
$ jsonnet -J vendor -S -e 'std.manifestYamlDoc((import "mixin.libsonnet").prometheusRules)' >files/rules.yml
152
152
$ jsonnet -J vendor -m files/dashboards -e '(import "mixin.libsonnet").grafanaDashboards'
153
153
```
154
+ ## Guidelines for alert names, labels, and annotations
155
+
156
+ Prometheus alerts deliberately allow users to define their own schema for
157
+ names, labels, and annotations. The following is a style guide recommended for
158
+ alerts in monitoring mixins. Following this guide helps creating useful
159
+ notification templates for all mixins and customizing mixin alerts in a unified
160
+ fashion.
161
+
162
+ The alert ** name** is a terse description of the alerting condition, using
163
+ camel case, without whitespace, starting with a capital letter. The first
164
+ component of the name should be shared between all alerts of a mixin (or
165
+ between a group of related alerts within a larger mixin). Examples:
166
+ ` NodeFilesystemAlmostOutOfFiles ` (from the [ node-exporter
167
+ mixin] ( https://github.com/prometheus/node_exporter/tree/master/docs/node-mixin ) ,
168
+ ` PrometheusNotificationQueueRunningFull ` (from the [ Prometheus
169
+ mixin] ( https://github.com/prometheus/prometheus/blob/master/documentation/prometheus-mixin ) ).
170
+
171
+ To mark the severity of an alert, use a ** label** called ` severity ` with one of
172
+ the following label values:
173
+ - ` critical ` for alerts that require immediate action. For a production system,
174
+ those alerts will usually hit a pager.
175
+ - ` warning ` for alerts that require action eventually but not urgently enough
176
+ to wake someone up or require them to immediately interrupt what they are
177
+ working on. A typical routing target for those alerts is some kind of ticket
178
+ queueing or bug tracking system.
179
+ - ` info ` for alerts that do not require any action by itself but mark something
180
+ as “out of the ordinary”. Those alerts aren't usually routed anywhere, but
181
+ can be inspected during troubleshooting.
182
+
183
+ An alert can have the following ** annotations** :
184
+ - ` summary ` (mandatory): Essentially a more comprehensive and readable version
185
+ of the alert name. Use a human-readable sentence, starting with a capital
186
+ letter and ending with a period. Use a static string or, if dynamic expansion
187
+ is needed, aim for expanding into the same string for alerts that are
188
+ typically grouped together into one notification. In that way, it can be used
189
+ as a common “headline” for all alerts in the notification template. Examples:
190
+ ` Filesystem has less than 3% inodes left. ` (for the
191
+ ` NodeFilesystemAlmostOutOfFiles ` alert mentioned above), `Prometheus alert
192
+ notification queue predicted to run full in less than 30m.` (for the
193
+ ` PrometheusNotificationQueueRunningFull ` alert mentioned above).
194
+ - ` description ` (mandatory): A detailed description of a single alert, with
195
+ most of the important information templated in. The description usually
196
+ expands into a different string for every individual alert within a
197
+ notification. A notification template can iterate through all the
198
+ descriptions and format them into a list. Examples (again corresponding to
199
+ the examples above): `Filesystem on {{ $labels.device }} at {{
200
+ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes
201
+ left.` , ` Alert notification queue of Prometheus %(prometheusName)s is running
202
+ full.`.
203
+
204
+ Note that we plan to add recommended optional annotations for a runbook link
205
+ (presumably called ` runbook_url ` ) and a dashboard link
206
+ (` dashboard_url ` ). However, we still need to work out how to configure patterns
207
+ for those URLs across mixins in a useful way.
0 commit comments