Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve prometheusrules (to show more labels and fix messages) #225

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions charts/authentik/ci/ct-values-metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
replicas: 1

worker:
replicas: 1

image:
repository: ghcr.io/goauthentik/server
tag: 2023.10.4
pullPolicy: IfNotPresent

ingress:
enabled: true
hosts:
- host: authentik.domain.tld
paths:
- path: "/"
pathType: Prefix

authentik:
log_level: debug
secret_key: 5up3r53cr37K3y
postgresql:
password: au7h3n71k
redis:
password: au7h3n71k

postgresql:
enabled: false
postgresqlPassword: au7h3n71k
persistence:
enabled: false

redis:
enabled: false
auth:
enabled: true
password: au7h3n71k

blueprints:
- authentik-ci-blueprint

prometheus:
serviceMonitor:
create: true
rules:
create: true
13 changes: 6 additions & 7 deletions charts/authentik/templates/prometheusrule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,30 +127,29 @@ spec:
- alert: NoWorkersConnected
labels:
severity: critical
expr: max without (pid) (authentik_admin_workers) < 1
expr: max by (pod) (authentik_admin_workers{namespace="{{ $.Release.Namespace }}", service="{{ include "authentik.names.fullname" $ }}-metrics"}) < 1
Copy link
Contributor

@wrenix wrenix Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract the filter to a helm-template variable and reuse it everywhere (instatt of code copy):

{{- $filter := printf 'namespace="%s", service="%s-metrics"' .Release.Namespace (include "authentik.names.fullname" . ) }}
          expr: max by (pod) (authentik_admin_workers{ {{ $filter }} }) < 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service is authentik-server-metrics, not just authentik-metrics, or {{ include "authentik.server.fullname" . }}-metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should use service=~"^{{ include "authentik.server.fullname" . }}.*"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not

for: 10m
annotations:
{{`
summary: No workers connected
message: authentik instance {{ $labels.instance }}'s worker are either not running or not connected.
message: authentik instance {{ $labels.pod }}'s worker are either not running or not connected.
`}}


- alert: PendingMigrations
labels:
severity: critical
expr: max without (pid) (django_migrations_unapplied_total) > 0
expr: max by (pod) (django_migrations_unapplied_total{namespace="{{ $.Release.Namespace }}", service="{{ include "authentik.names.fullname" $ }}-metrics"}) > 0
for: 10m
annotations:
{{`
summary: Pending database migrations
message: authentik instance {{ $labels.instance }} has pending database migrations
message: authentik instance {{ $labels.pod }} has pending database migrations
`}}

- alert: FailedSystemTasks
labels:
severity: critical
expr: sum(increase(authentik_system_tasks{status="error"}[2h])) by (task_name, task_uid) > 0
expr: sum(increase(authentik_system_tasks{status="error", namespace="{{ $.Release.Namespace }}", service="{{ include "authentik.names.fullname" $ }}-metrics"}[2h])) by (task_name, task_uid) > 0
for: 2h
annotations:
{{`
Expand All @@ -161,7 +160,7 @@ spec:
- alert: DisconnectedOutposts
labels:
severity: critical
expr: sum by (outpost) (max without (pid) (authentik_outposts_connected{uid!~"specific.*"})) < 1
expr: max by (outpost) (authentik_outposts_connected{namespace="{{ $.Release.Namespace }}", service="{{ include "authentik.names.fullname" $ }}-metrics", uid!~"specific.*"}) < 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing namespace and service to identify which helm-release alerts ...

for: 30m
annotations:
{{`
Expand Down
Loading