Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjustable Timeout Needed for Leader Election in PostgreSQL Operator #2887

Open
uk1988 opened this issue Mar 26, 2025 · 0 comments
Open

Adjustable Timeout Needed for Leader Election in PostgreSQL Operator #2887

uk1988 opened this issue Mar 26, 2025 · 0 comments

Comments

@uk1988
Copy link

uk1988 commented Mar 26, 2025

  • Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.13.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s on rke2
  • Are you running Postgres Operator in production? yes
  • Type of issue? Bug report/feature request

We’ve observed that when etcd is under heavy load, the PostgreSQL operator fails to complete the setup of a database cluster. Based on my understanding of the code, the operator attempts to communicate with etcd five times in quick succession to designate a leader pod and initiate the database startup. However, in scenarios where we were using slower Azure disks—combined with etcd being under load—the new PostgreSQL database pod became stuck in the leader election process and never recovered.

Is there a way to increase the timeout in the operator to handle such cases? If not can it be added?
Generally, we do not understand why leader election is retry limited.
We only encounter slow disks in our test environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant