Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs with cluster replication health panel #25

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

solidDoWant
Copy link

The query (and subsequent value mapping) for this panel has a couple of issues:

  • It shows as degraded if there is one online, available, ready replica
  • It doesn't actually report if the cluster is replicating successfully, just whether or not there are two or more online, possibly erroring replicas

I've updated the query and value mapping to address both of these. Here is an easier to read version of these changes:

# -1 if the cluster has 0 replicas. 0 if there are replicas, and all are healthy. >= 1 is the number of unhealthy replicas.
# Value mappings:
# -1 = Unhealthy/no replicas
# 0 = Healthy
# >= 1 = Degraded
# Number of unhealthy replicas. Can be 0 if there are 0 replicas.
(
    # Total number of replicas
    max(cnpg_pg_replication_streaming_replicas{namespace=~"$namespace", pod=~"$instances"}) - 
    # Total number of replicas that can stream WALs
    sum(cnpg_pg_replication_is_wal_receiver_up{namespace=~"$namespace", pod=~"$instances"})
) + 
# 1 is the cluster has no replicas, 0 if the cluster has replicas
(
    # 0 if there are any replicas, -1 if there are not
    clamp_max(
        max(cnpg_pg_replication_streaming_replicas{namespace=~"$namespace", pod=~"$instances"}),
        1
    ) - 1  
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant