scheduler plugin should de-prioritize newer nodes #846
Labels
c/autoscaling/scheduler
Component: autoscaling: k8s scheduler
t/feature
Issue type: feature, for new features or requests
Problem description / Motivation
Currently the load on the scheduler is somewhat unusual: we have (usually) short (but uneven) lifetimes of computes, with varying external load producing regular usage spikes that
This load sometimes interacts with our node scoring algorithm to result in chaotic (in the mathematical sense) and cyclical fluctuations in reserved resources on the nodes. This has a single primary effect:
In particular, this happens most visibly when a node is added due to external demand — sometimes it is removed after demand returns to normal, but sometimes another node's usage goes down instead (but not far enough to be removed).
Here's a recent example:
Discussion here: https://neondb.slack.com/archives/C03TN5G758R/p1709660933447909
Feature idea(s) / DoD
To mitigate the issues above, the scheduler plugin should de-prioritize newer nodes - providing both a consistent ordering (preventing "swapping" usage between nodes) and explicitly prioritizing removal of nodes that are added to satisfy immediate demand (which will have fewer long-running computes).
Implementation ideas
From the slack thread linked above:
The text was updated successfully, but these errors were encountered: