Adjustable Timeout Needed for Leader Election in PostgreSQL Operator #2887

uk1988 · 2025-03-26T13:11:39Z

Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.13.0
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s on rke2
Are you running Postgres Operator in production? yes
Type of issue? Bug report/feature request

We’ve observed that when etcd is under heavy load, the PostgreSQL operator fails to complete the setup of a database cluster. Based on my understanding of the code, the operator attempts to communicate with etcd five times in quick succession to designate a leader pod and initiate the database startup. However, in scenarios where we were using slower Azure disks—combined with etcd being under load—the new PostgreSQL database pod became stuck in the leader election process and never recovered.

Is there a way to increase the timeout in the operator to handle such cases? If not can it be added?
Generally, we do not understand why leader election is retry limited.
We only encounter slow disks in our test environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjustable Timeout Needed for Leader Election in PostgreSQL Operator #2887

Adjustable Timeout Needed for Leader Election in PostgreSQL Operator #2887

uk1988 commented Mar 26, 2025

Adjustable Timeout Needed for Leader Election in PostgreSQL Operator #2887

Adjustable Timeout Needed for Leader Election in PostgreSQL Operator #2887

Comments

uk1988 commented Mar 26, 2025