Skip to content

Commit 2601161

Browse files
author
technicianted
committed
Initial commit
1 parent 47ab2a9 commit 2601161

16 files changed

+511
-27
lines changed

Diff for: .vscode/settings.json

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"makefile.configureOnOpen": false
3+
}

Diff for: README.md

+25-5
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,16 @@ $ # build docker image
1616
$ make docker-build
1717
```
1818

19+
### Running
20+
Simplest way is to use the helm chart. Edit the file `configs/_policies.yaml` to define your staggering policies then install the helm chart:
21+
```bash
22+
$ helm upgrade \
23+
--install \
24+
--namespace stagger \
25+
--create-namespace
26+
stagger helm/stagger
27+
```
28+
1929
### Example
2030
Consider the following example where we want to stagger access to image pulls such that something like [spegel](https://github.com/spegel-org/spegel) gets a chance to seed the images. We want to control staggering per image, not as a whole for cache population and seeding:
2131
```yaml
@@ -71,11 +81,6 @@ spec:
7181
7282
In some situations where a staggering policy spans multiple pods controlled by different Kubernets controllers, we may want to bypass staggering for a certain set of these pods due to subtle startup dependencies. To do that, policies include `BypassLabelSelector` that lets you specify a label selector that if matched, this policy will not apply but the pod itself will be counted against pacing.
7383

74-
### How it works
75-
Stagger works by monitoring pods via an admission controller. With each new pods, it is evaluated against defined policies. Once it is associated with one, its pacer is consulted to see if it should be allowed to start. If it is not, a special `nodeSelector` is added to block its scheduling.
76-
77-
Next, a reconciler controller monitors pods events and status changes. With each change of a staggered pod, its corresponding pacer is consulted. If it is allowed to start, the pod is evicted and will be recreated where the admission controller will let it be scheduled.
78-
7984
### Job controllers special handling
8085
Special handling is needed for pods created by a Job controller. By default, Job controllers do not differentiate between an evicted pod and a failed one. Since we use pod eviction to reschedule the pod, Job specs need to be changed to inject the following failure policy:
8186
```yaml
@@ -87,3 +92,18 @@ Special handling is needed for pods created by a Job controller. By default, Job
8792
```
8893

8994
**Note: if your Job spec already has a `DisruptionTarget` policy with `action` not set to `Ignore`, stagger will issue a warning and will not apply policies**
95+
96+
### FAQ
97+
* Can a single stagger group span multiple controllers?
98+
Yes. It can even span multiple namespaces.
99+
100+
* How are pods prevented from starting up (staggered)?
101+
Stagger works by monitoring pods via an admission controller. With each new pods, it is evaluated against defined policies. Once it is associated with one, its pacer is consulted to see if it should be allowed to start. If it is not, a special pod specs are replaced with stub specs with same resources. Further, an init container is appended that will block the startup of the pod. When the reconciler is ready, the pod is evicted and restarted with its original specs.
102+
103+
Next, a reconciler controller monitors pods events and status changes. With each change of a staggered pod, its corresponding pacer is consulted. If it is allowed to start, the pod is evicted and will be recreated where the admission controller will let it be scheduled.
104+
105+
* Why not use `scale` subresource?
106+
One of the important design objectives is to be controller agnostic, and be able to stagger across multiple controllers. If `scale` subresource is used as a mechanism of staggering then it'll pose many restrictions. For example, the owning controller must support `scale`. Also other controllers such as HPA may be already controlling the `scale` subresource and will conflict with staggering.
107+
108+
* What about gang scheduling?
109+
Gang scheduling blocks the scheduling of a group of pods until all requested resources are ready. Stagger handles this by using an init container to block the startup of the pod such that gang schedulers are not affected.

Diff for: cmd/stagger/cmd/container.go

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
// Copyright (c) stagger team and contributors. All rights reserved.
2+
// Licensed under the MIT license. See LICENSE file in the project root for details.
3+
package cmd
4+
5+
import (
6+
"os"
7+
"os/signal"
8+
"stagger/pkg/version"
9+
"syscall"
10+
11+
"github.com/spf13/cobra"
12+
)
13+
14+
var containerCMD = &cobra.Command{
15+
Use: "container",
16+
Short: "stub for staggered pod containers",
17+
Run: runContainer,
18+
}
19+
20+
func init() {
21+
RootCMD.AddCommand(containerCMD)
22+
}
23+
24+
func runContainer(command *cobra.Command, args []string) {
25+
logger := SetupTelemetryAndLogging()
26+
logger.Info("starting stagger container", "version", version.Build, "options", options)
27+
28+
sigs := make(chan os.Signal, 1)
29+
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
30+
<-sigs
31+
32+
os.Exit(0)
33+
}

Diff for: cmd/stagger/cmd/initcontainer.go

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
// Copyright (c) stagger team and contributors. All rights reserved.
2+
// Licensed under the MIT license. See LICENSE file in the project root for details.
3+
package cmd
4+
5+
import (
6+
"os"
7+
8+
"github.com/spf13/cobra"
9+
)
10+
11+
var initContainerCMD = &cobra.Command{
12+
Use: "initcontainer",
13+
Short: "stub for staggered pod init containers",
14+
Run: runInitContainer,
15+
}
16+
17+
func init() {
18+
RootCMD.AddCommand(initContainerCMD)
19+
}
20+
21+
func runInitContainer(command *cobra.Command, args []string) {
22+
os.Exit(0)
23+
}

Diff for: cmd/stagger/cmd/root.go

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
1-
package cmd
2-
3-
//
41
// Copyright (c) stagger team and contributors. All rights reserved.
52
// Licensed under the MIT license. See LICENSE file in the project root for details.
6-
//
3+
package cmd
4+
75
import (
86
"net/http"
97

Diff for: cmd/stagger/cmd/service.go

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
1-
package cmd
2-
3-
//
41
// Copyright (c) stagger team and contributors. All rights reserved.
52
// Licensed under the MIT license. See LICENSE file in the project root for details.
6-
//
3+
package cmd
4+
75
import (
86
"os"
97
"os/signal"

Diff for: go.mod

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ require (
2424
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8
2525
sigs.k8s.io/controller-runtime v0.19.0
2626
sigs.k8s.io/e2e-framework v0.4.0
27+
volcano.sh/apis v1.10.0
2728
)
2829

2930
require (

Diff for: go.sum

+4-2
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1
1414
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
1515
github.com/emicklei/go-restful/v3 v3.11.0 h1:rAQeMHw1c7zTmncogyy8VvRZwtkmkZ4FxERmMY4rD+g=
1616
github.com/emicklei/go-restful/v3 v3.11.0/go.mod h1:6n3XBCmQQb25CM2LCACGz8ukIrRry+4bhvbpWn3mrbc=
17-
github.com/evanphx/json-patch v0.5.2 h1:xVCHIVMUu1wtM/VkR9jVZ45N3FhZfYMMYGorLCR8P3k=
18-
github.com/evanphx/json-patch v0.5.2/go.mod h1:ZWS5hhDbVDyob71nXKNL0+PWn6ToqBHMikGIFbs31qQ=
17+
github.com/evanphx/json-patch v4.12.0+incompatible h1:4onqiflcdA9EOZ4RxV643DvftH5pOlLGNtQ5lPWQu84=
18+
github.com/evanphx/json-patch v4.12.0+incompatible/go.mod h1:50XU6AFN0ol/bzJsmQLiYLvXMP4fmwYFNcr97nuDLSk=
1919
github.com/evanphx/json-patch/v5 v5.9.0 h1:kcBlZQbplgElYIlo/n1hJbls2z/1awpXxpRi0/FOJfg=
2020
github.com/evanphx/json-patch/v5 v5.9.0/go.mod h1:VNkHZ/282BpEyt/tObQO8s5CMPmYYq14uClGH4abBuQ=
2121
github.com/foxcpp/go-mockdns v1.1.0 h1:jI0rD8M0wuYAxL7r/ynTrCQQq0BVqfB99Vgk7DlmewI=
@@ -279,3 +279,5 @@ sigs.k8s.io/structured-merge-diff/v4 v4.4.1 h1:150L+0vs/8DA78h1u02ooW1/fFq/Lwr+s
279279
sigs.k8s.io/structured-merge-diff/v4 v4.4.1/go.mod h1:N8hJocpFajUSSeSJ9bOZ77VzejKZaXsTtZo4/u7Io08=
280280
sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E=
281281
sigs.k8s.io/yaml v1.4.0/go.mod h1:Ejl7/uTz7PSA4eKMyQCUTnhZYNmLIl+5c2lQPGR2BPY=
282+
volcano.sh/apis v1.10.0 h1:Z9eLwibQmhpFmYGLWxjsTWwsYeTEKvvjFcLptmP2qxE=
283+
volcano.sh/apis v1.10.0/go.mod h1:z8hhFZ2qcUMR1JIjVYmBqL98CVaXNzsQAcqKiytQW9s=

Diff for: pkg/blocker/stubpod.go

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
// Copyright (c) stagger team and contributors. All rights reserved.
2+
// Licensed under the MIT license. See LICENSE file in the project root for details.
3+
package blocker
4+
5+
import (
6+
"fmt"
7+
"stagger/pkg/blocker/types"
8+
9+
"github.com/go-logr/logr"
10+
corev1 "k8s.io/api/core/v1"
11+
)
12+
13+
var _ types.PodBlocker = &stubPod{}
14+
15+
type stubPod struct {
16+
containerImage string
17+
}
18+
19+
func NewStubPod(containerImage string) types.PodBlocker {
20+
return &stubPod{
21+
containerImage: containerImage,
22+
}
23+
}
24+
25+
func (b *stubPod) Block(podSpec *corev1.PodSpec, logger logr.Logger) error {
26+
for i, container := range podSpec.InitContainers {
27+
container.Image = b.containerImage
28+
container.Command = nil
29+
container.Args = []string{"initcontainer"}
30+
container.VolumeMounts = nil
31+
podSpec.InitContainers[i] = container
32+
}
33+
// these will never start, just stubs to prevent image pulls
34+
for i, container := range podSpec.Containers {
35+
container.Image = b.containerImage
36+
container.Command = nil
37+
container.Args = nil
38+
container.VolumeMounts = nil
39+
podSpec.Containers[i] = container
40+
}
41+
podSpec.InitContainers = append(podSpec.InitContainers, corev1.Container{
42+
Name: "stagger",
43+
Image: b.containerImage,
44+
Args: []string{"container"},
45+
})
46+
47+
return nil
48+
}
49+
50+
func (b *stubPod) Unblock(podSpec *corev1.PodSpec, logger logr.Logger) error {
51+
return fmt.Errorf("not supported")
52+
}
53+
54+
func (b *stubPod) IsBlocked(podSpec *corev1.PodSpec) bool {
55+
for _, container := range podSpec.InitContainers {
56+
if container.Name == "stagger" &&
57+
container.Image == b.containerImage {
58+
return true
59+
}
60+
}
61+
62+
return false
63+
}

Diff for: pkg/blocker/stubpod_test.go

+60
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
// Copyright (c) stagger team and contributors. All rights reserved.
2+
// Licensed under the MIT license. See LICENSE file in the project root for details.
3+
package blocker
4+
5+
import (
6+
"testing"
7+
8+
"github.com/go-logr/zapr"
9+
"github.com/stretchr/testify/require"
10+
"go.uber.org/zap"
11+
corev1 "k8s.io/api/core/v1"
12+
)
13+
14+
func TestStubPodSuccess(t *testing.T) {
15+
zlog, _ := zap.NewDevelopment()
16+
logger := zapr.NewLogger(zlog)
17+
18+
pod := corev1.Pod{
19+
Spec: corev1.PodSpec{
20+
InitContainers: []corev1.Container{
21+
{
22+
Name: "init1",
23+
Image: "image1",
24+
Command: []string{"command1"},
25+
Args: []string{"args1"},
26+
},
27+
},
28+
Containers: []corev1.Container{
29+
{
30+
Name: "container1",
31+
Image: "image1",
32+
Command: []string{"command1"},
33+
Args: []string{"args1"},
34+
},
35+
},
36+
},
37+
}
38+
39+
blocker := NewStubPod("staggerimage")
40+
err := blocker.Block(&pod.Spec, logger)
41+
require.NoError(t, err)
42+
require.Len(t, pod.Spec.InitContainers, 2)
43+
initContainer := pod.Spec.InitContainers[0]
44+
require.Nil(t, initContainer.Command)
45+
require.Equal(t, "staggerimage", initContainer.Image)
46+
initContainer = pod.Spec.InitContainers[1]
47+
require.Equal(t, "staggerimage", initContainer.Image)
48+
49+
require.Len(t, pod.Spec.Containers, 1)
50+
container := pod.Spec.Containers[0]
51+
require.Nil(t, container.Command)
52+
require.Equal(t, "staggerimage", container.Image)
53+
54+
blocked := blocker.IsBlocked(&pod.Spec)
55+
require.True(t, blocked)
56+
57+
pod.Spec.InitContainers[1].Image = "someotherimage"
58+
blocked = blocker.IsBlocked(&pod.Spec)
59+
require.False(t, blocked)
60+
}

Diff for: pkg/cmd/cmd.go

+12-5
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,11 @@ func NewCMDWithManager(mgr manager.Manager, options Options, logger logr.Logger)
3131
return nil, err
3232
}
3333

34-
podGroupClassifier, err := NewPodgroupClassifier(mgr, logger)
34+
blocker, err := NewBlocker(options)
35+
if err != nil {
36+
return nil, err
37+
}
38+
podGroupClassifier, err := NewPodgroupClassifier(mgr, blocker, logger)
3539
if err != nil {
3640
return nil, err
3741
}
@@ -59,6 +63,7 @@ func NewCMDWithManager(mgr manager.Manager, options Options, logger logr.Logger)
5963
options,
6064
matchPredicate,
6165
mgr,
66+
blocker,
6267
classifier,
6368
podGroupClassifier,
6469
recorderFactory,
@@ -113,10 +118,12 @@ func (c *CMD) Start(logger logr.Logger) error {
113118
func (c *CMD) Stop(logger logr.Logger) error {
114119
logger.Info("shutting down")
115120

116-
c.shutdownContextCancel()
117-
logger.Info("waiting for shutdown")
118-
<-c.shutdownCompleteChan
119-
logger.Info("shutdown completed")
121+
if c.shutdownCompleteChan != nil {
122+
c.shutdownContextCancel()
123+
logger.Info("waiting for shutdown")
124+
<-c.shutdownCompleteChan
125+
logger.Info("shutdown completed")
126+
}
120127

121128
return nil
122129
}

Diff for: pkg/cmd/factories.go

+15-3
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import (
66
"fmt"
77
"net/http"
88
"stagger/pkg/blocker"
9+
blockertypes "stagger/pkg/blocker/types"
910
"stagger/pkg/config/types"
1011
"stagger/pkg/controller"
1112
controllertypes "stagger/pkg/controller/types"
@@ -105,8 +106,10 @@ func NewGroupClassifier(policies []StaggeringPolicy, logger logr.Logger) (contro
105106
return classifier, nil
106107
}
107108

108-
func NewPodgroupClassifier(mgr manager.Manager, logger logr.Logger) (controllertypes.PodGroupStandingClassifier, error) {
109-
return controller.NewPodGroupStandingClassifier(mgr.GetClient(), blocker.NewNodeSelectorPodBlocker()), nil
109+
func NewPodgroupClassifier(mgr manager.Manager, blocker blockertypes.PodBlocker, logger logr.Logger) (controllertypes.PodGroupStandingClassifier, error) {
110+
return controller.NewPodGroupStandingClassifier(
111+
mgr.GetClient(),
112+
blocker), nil
110113
}
111114

112115
func NewRecorderFactory(logger logr.Logger) (controllertypes.ObjectRecorderFactory, error) {
@@ -117,6 +120,7 @@ func RegisterAdmissionController(
117120
options Options,
118121
matchPredicate predicate.Predicate,
119122
mgr manager.Manager,
123+
blocker blockertypes.PodBlocker,
120124
classifier controllertypes.PodClassifier,
121125
podGroupClassifier controllertypes.PodGroupStandingClassifier,
122126
recorderFactory controllertypes.ObjectRecorderFactory,
@@ -141,7 +145,7 @@ func RegisterAdmissionController(
141145
classifier,
142146
podGroupClassifier,
143147
recorderFactory,
144-
blocker.NewNodeSelectorPodBlocker(),
148+
blocker,
145149
flightTracker,
146150
options.BypassFailure,
147151
options.EnableLabel,
@@ -226,3 +230,11 @@ func CreateKubernetesConfig(opts KubernetesOptions) (*rest.Config, error) {
226230

227231
return config, err
228232
}
233+
234+
func NewBlocker(opts Options) (blockertypes.PodBlocker, error) {
235+
if len(opts.StaggerContainerImage) == 0 {
236+
return nil, fmt.Errorf("stagger container image must be specified")
237+
}
238+
239+
return blocker.NewStubPod(opts.StaggerContainerImage), nil
240+
}

Diff for: pkg/cmd/options.go

+2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ type Options struct {
3030
KubernetesOptions
3131

3232
StaggeringConfigPath string `cliArgName:"staggering-config-path" cliArgDescription:"path to staggering config yaml file" cliArgGroup:"Staggering"`
33+
StaggerContainerImage string `cliArgName:"staggering-container-image" cliArgDescription:"stagger container image to use for stub pods" cliArgGroup:"Staggering"`
3334
BypassFailure bool `cliArgName:"staggering-bypass-errors" cliArgDescription:"do not block admission on errors" cliArgGroup:"Staggering"`
3435
EnableLabel string `cliArgName:"staggering-enable-label" cliArgDescription:"pod label to enable staggering behavior" cliArgGroup:"Staggering"`
3536
MaxFlightDuration time.Duration `cliArgName:"staggering-max-pod-flight-duration" cliArgDescription:"maximum time to wait for a pod from admission to reconciliation after which it is assumed committed" cliArgGroup:"Staggering"`
@@ -57,6 +58,7 @@ func NewLeaderElectionOptions() LeaderElectionOptions {
5758
func NewOptions() Options {
5859
return Options{
5960
KubernetesOptions: NewKubernetesOptions(),
61+
StaggerContainerImage: "technicianted/stagger",
6062
BypassFailure: true,
6163
EnableLabel: controller.DefaultEnableLabel,
6264
MaxFlightDuration: 1000 * time.Millisecond,

0 commit comments

Comments
 (0)