You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.13.0
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s on rke2
Are you running Postgres Operator in production? yes
Type of issue? Bug report/feature request
We’ve observed that when etcd is under heavy load, the PostgreSQL operator fails to complete the setup of a database cluster. Based on my understanding of the code, the operator attempts to communicate with etcd five times in quick succession to designate a leader pod and initiate the database startup. However, in scenarios where we were using slower Azure disks—combined with etcd being under load—the new PostgreSQL database pod became stuck in the leader election process and never recovered.
Is there a way to increase the timeout in the operator to handle such cases? If not can it be added?
Generally, we do not understand why leader election is retry limited.
We only encounter slow disks in our test environments.
The text was updated successfully, but these errors were encountered:
We’ve observed that when etcd is under heavy load, the PostgreSQL operator fails to complete the setup of a database cluster. Based on my understanding of the code, the operator attempts to communicate with etcd five times in quick succession to designate a leader pod and initiate the database startup. However, in scenarios where we were using slower Azure disks—combined with etcd being under load—the new PostgreSQL database pod became stuck in the leader election process and never recovered.
Is there a way to increase the timeout in the operator to handle such cases? If not can it be added?
Generally, we do not understand why leader election is retry limited.
We only encounter slow disks in our test environments.
The text was updated successfully, but these errors were encountered: