Scheduler leader election #841

sharnoff · 2024-03-02T01:22:24Z

Problem description / Motivation

Similar to #762, we only run a single instance of the scheduler at a time, which means we're vulnerable to extended outages if a node goes down. A "simple" way to fix this is via leader election.

Currently this is unsound, and is unlikely to work correctly.

Feature idea(s) / DoD

Scheduler supports leader election, for high availability in case of single node failure.

Scheduler should probably also have anti-affinity with itself (not sure if that's already provided with replicaset / deployment).

Implementation ideas

In addition to the changes to the deployment yaml, we also should adapt the scheduler plugin in some way so that its state is discarded when it's no longer the leader. Not sure how much work this is, or how we can get that signal.

Alternatively, if the pod/VM/node listing on startup is too expensive, we can modify the plugin so that having decisions made without its input is actually sound (within reason).

We also need to adapt the autoscaler-agent to be able to handle multiple scheduler instances — or expose a connection to the current leader via k8s service, or something. Not sure if that's possible.

sharnoff added a/reliability Area: relates to reliability of the service t/feature Issue type: feature, for new features or requests c/autoscaling/scheduler Component: autoscaling: k8s scheduler labels Mar 2, 2024

sharnoff mentioned this issue Apr 21, 2024

Remove scheduler plugin "buffer" resources #840

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler leader election #841

Scheduler leader election #841

sharnoff commented Mar 2, 2024

Scheduler leader election #841

Scheduler leader election #841

Comments

sharnoff commented Mar 2, 2024

Problem description / Motivation

Feature idea(s) / DoD

Implementation ideas