-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control plane should scale up in parallel, not serially #12007
Comments
/assign |
I think we'll need to do the first CP with kubeadm init still then the rest could go in parallel |
Yeah that make sense. |
Please consider that KCP is build under the assumption that we only create/delete one Machine at a time. If we want to implement this change, this assumption goes out the window. (My assumption is this issue is about KCP. Is that correct? (as KCP is not mentioned at all)) |
@sbueringer - yep this about KCP. I'll work on poc initially so that it will aid discussion. |
Just a few more notes:
|
Thanks for the notes and insight @sbueringer 🙇 |
parallel join is generally not supported by kubeadm and YMMV.
|
I wasn't aware of this. Sounds like a showstopper to me. |
also, it's not documented anywhere at the k8s.io website and our test tool https://github.com/kubernetes/kubeadm/tree/main/kinder only joins in serial, therefore we have no e2e tests for it. i recall users and some vmware projects doing parallel join at some point, but that was a while back. |
TBH, I'm struggling a little bit to understand what benefit this change will bring to the users, because considering, that Init cannot happen in parallel, then we are already unblocking workers to joins immediately after init completes (so IMO the fact when 2nd and 3rd CP joins sequentially or in parallel will not bring any benefit to the overall cluster provisioning time). Also, quoting similar discussion in the past, we always ended up in preferring stability over speed KCP, e.g. #3876. I also agree on the fact that kubeadm support is a showstopper, as well as the fact that implementing this will be way trickier and risky than you might expect because KCP is build under the assumption that we only create/delete one Machine at a time. Assuming we find a way forward to address the swostopper, before diving deep into KCP changes, I would suggest to do a preliminary impact analisys of all the code path in KCP (scale up, down, rollout, remediation, basically everything) + discuss outcomes before creating a PR (considering the complexity of KCP's code organization, doing such complex discussions within PR comments will probably result more time consuming/dispersive than having a focused discussion in a design doc). |
Yep i agree with you here @fabriziopandini. The poc was really a way to help with the investigation and help understand everything as an aid to writing a proposal......and not as a way to put up a PR for discussion.
The benefit is getting the control plane in its desired state sooner when doing 2nd, 3rd (or even 4th & 5th) nodes in parallel. This is more beneficial when using infra providers that take a long time to provision, like MaaS.
This does sound like a showstopper for now with this. Especially so if there are no e2e tests. Parallel joins would have to be supported (with tests) in Kubeadm before this could proceed. |
Thanks for all the helpful input @neolit123 @sbueringer @fabriziopandini |
Doing issue triage /kind feature As per comments above: this should be first sorted out in kubeadm to get actionable on CAPI side. |
As there is nothing to do on this at present: /unassign |
In issue #2016 @dlipovetsky correctly suggested the opposite be true:
In this description, he states that we would likely be able to scale up in parallel by using etcd non-voting members. Kubeadm has completed adding support for that requirement and we should look at returning to parallel scale up using this feature.
The text was updated successfully, but these errors were encountered: