Recreating control-plane members generates etcd errors and orphaned etcd members #3577
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/cluster-management
Denotes a PR or issue as being assigned to SIG Cluster Management.
What happened?
We experimented with rolling update logic (related to #3540) by adding a new node to a kubeone cluster and removing it afterwards. This resulted in some etcd errors and other nodes being labeled as "NotReady" for several minutes.
Etcdserver error
Error from server: etcdserver: request timed out
Etcd member list that shows that the old node has not been removed
Expected behavior
Removing a node from kubernetes and running kubeone apply should cleanup any orphaned etcd members.
The following warning shows up but
etcd member list
still shows the member.Kubeone warning
Member list
How to reproduce the issue?
kubectl drain && kubectl delete node
)1.9.1 and 1.9.2
Client 1
Client 2
Provide your KubeOneCluster manifest here (if applicable)
What cloud provider are you running on?
Openstack
What operating system are you running in your cluster?
Flatcar Linux
Additional information
It seems like this is not very consistent, we had several runs without any problems but others with the first try resulting in the orphaned member. We're still investigating and can only guess the problem's source.
Out best guess is that removing and re-adding etcd-members overloads the etcd-cluster (+ NotReady nodes) and results in the member deletion not getting triggered (resulting in orphaned members).
cc @toschneck
The text was updated successfully, but these errors were encountered: