Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Kubernetes API server failover #3008

Closed
mxey opened this issue May 8, 2018 · 11 comments · Fixed by #3522
Closed

Handle Kubernetes API server failover #3008

mxey opened this issue May 8, 2018 · 11 comments · Fixed by #3522
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@mxey
Copy link

mxey commented May 8, 2018

Environment
Dashboard version: v1.8.3
Kubernetes version: v1.10.2
Operating system: CentOS Linux release 7.4.1708 (Core)
Steps to reproduce
  1. Have a multi-master Kubernetes cluster
  2. Run dashboard with in-cluster config
  3. Stop one of the API servers
Observed result
  • Dashboard hangs while trying to load cluster resources, until Linux eventually timeouts the TCP connection.
  • The dashboard pod is not killed and restarted automatically either, because the liveness probe does not exercise the Kubernetes API connection
  • Even after the TCP timeout and eventual reconnect, the log still repeatedly shows Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout (see Extremely dangerous logging #2723 (comment))
Expected result
@m3co-code
Copy link

I just experienced the same issue. The dashboard was trying to synchronize in a fast loop (thousands of log entries in one second) consuming a lot of CPU.

@bryk bryk added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label May 25, 2018
@vdboor
Copy link

vdboor commented Jun 4, 2018

Even with a single-master cluster this happens.

Steps to reproduce:

  • Run the dashboard (e.g. with default configuration from the helm chart)
  • Restart the API server (find with kubectl get pods -n kube-system -l component=kube-apiserver)

The logging flood stops when the dashboard pod is deleted/recreated.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2018
@vdboor
Copy link

vdboor commented Sep 3, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2018
@maciaszczykm
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 3, 2018
@floreks floreks added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2018
@ninlil
Copy link

ninlil commented Jan 15, 2019

Same problem.

Running Azure AKS
K8 version 1.11.4 and 1.11.5
Dashboard image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0

After "Microsoft" sometimes restarts the managed API-server the dashboard starts to log around 450 lines per 5 minutes.

Any update on actually getting the reconnect solved?

@floreks
Copy link
Member

floreks commented Jan 15, 2019

@zenlil it will be fixed in v2. Right now a workaround is to delete Dashboard pod after api server restart.

@WhoAteDaCake
Copy link

Still happening in 1.10.1 ? I was flodded with gigs of logs in no time. Deleting the dashboard pod did not solve it

@floreks
Copy link
Member

floreks commented Jun 11, 2019

Can you upload beginning of the log? First 30m let's say.

@spingel
Copy link

spingel commented Jul 26, 2019

These are the initial log entries that we saw when we encountered the issue:

2019/07/26 02:28:58 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: unexpected object: &Secret{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Data:map[string][]byte{},Type:,StringData:map[string]string{},}

2019/07/26 02:29:00 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.

2019/07/26 02:29:00 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system

2019/07/26 02:29:00 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout

2019/07/26 02:29:02 Restarting synchronizer: kubernetes-dashboard-key-holder-kube-system.

2019/07/26 02:29:02 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system

2019/07/26 02:29:02 Synchronizer kubernetes-dashboard-key-holder-kube-system exited with error: kubernetes-dashboard-key-holder-kube-system watch ended with timeout

The last 3 errors repeat every 2 seconds causing a flood of log entries.

@floreks
Copy link
Member

floreks commented Jul 26, 2019

It's no longer a case with v2 as it forces restart of the pod after few retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

Successfully merging a pull request may close this issue.