You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Phase 1 - Out-Of-Tree SWAP with WASP | 2024 | Tech-Preview | Beta |
125
+
| Phase 2 - Transition to Kubernetes SWAP. Limited to CNV-only users | mid-end 2025 | GA | GA |
126
+
| Phase 3 - Kubernetes SWAP for all Openshift users | TBD | GA | GA |
129
127
130
128
Because [Kubernetes SWAP] is currently in Beta and is only expected to GA within
131
129
Kubernetes releases 1.33-1.35 (discussion about its GA criterias are still ongoing).
132
130
this proposal is taking a three-phased approach in order to meet the timeline requirements.
133
131
134
132
***Phase 1** - OpenShift Virtualization will provide an out-of-tree
135
-
solution to enable higher workload density and swap-based eviction mechanisms.
133
+
solution (WASP) to enable higher workload density and swap-based eviction mechanisms.
136
134
***Phase 2** - OpenShift Virtualization will transition to [Kubernetes SWAP] (in-tree).
137
-
OpenShift will allow using SWAP only for CNV users, that is,
138
-
whenever OpenShift Virtualization is installed on the cluster.
135
+
OpenShift will [allow](#swap-ga-for-cnv-users-only) SWAP to be configured only if OpenShift Virtualization is installed on the cluster.
139
136
In this phase, WASP will be dropped in favor of GAed Kubernetes mechanisms.
140
137
***Phase 3** - OpenShift will GA SWAP for every user, even if OpenShift Virtualization
141
138
is not installed on the cluster.
@@ -160,7 +157,7 @@ virtual machine in a cluster.
160
157
161
158
a. The cluster admin is adding the `failOnSwap=false` flag to the
162
159
kubelet configuration via a `KubeletConfig` CR, in order to ensure
163
-
that the kubelet will start once swap has been rolled out.
160
+
that the kubelets will start once swap has been rolled out.
164
161
a. The cluster admin is calculating the amount of swap space to
165
162
provision based on the amount of physical ram and overcommitment
166
163
ratio
@@ -171,16 +168,16 @@ virtual machine in a cluster.
171
168
4. The cluster admin is configuring OpenShift Virtualization for higher
172
169
workload density via
173
170
174
-
a. the OpenShift Virtualization Console "Settings" page
175
-
b. or `HCO` API
171
+
a. the OpenShift Virtualization Console "Settings" page
172
+
b. [or `HCO` API](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/cluster-configuration.md#configure-higher-workload-density)
176
173
177
174
The cluster is now set up for higher workload density.
178
175
179
176
In phase 3, deploying the WASP agent will not be needed.
180
177
181
178
#### Workflow: Leveraging higher workload density
182
179
183
-
1. The VM Owner is creating a regular virtual machine and is launching it.
180
+
1. The VM Owner is creating a regular virtual machine and is launching it. The VM owner must not specify memory requests in the VM spec, but only the guest memory size.
184
181
185
182
### API Extensions
186
183
@@ -229,7 +226,7 @@ The design is driven by the following guiding principles:
229
226
An OCI Hook to enable swap by setting the containers cgroup
230
227
`memory.swap.max=max`.
231
228
232
-
***Technology Preview**
229
+
***Tech Preview**
233
230
* Uses `UnlimitedSwap`.
234
231
***General Availability**
235
232
* Limited to burstable QoS class pods.
@@ -242,7 +239,7 @@ For more info, refer to the upstream documentation on how to calculate
242
239
###### Provisioning swap
243
240
244
241
Provisioning of swap is left to the cluster administrator.
245
-
The hook itself is not making any assumption where the swap is located.
242
+
The OCI hook itself is not making any assumption where the swap is located.
246
243
247
244
As long as there is no additional tooling available, the recommendation
248
245
is to use `MachineConfig` objects to provision swap on nodes.
@@ -263,9 +260,9 @@ Without system services such as `kubelet` or `crio`, any container will
263
260
not be able to run well.
264
261
265
262
Thus, in order to protect the `system.slice` and ensure that the nodes
266
-
infrastructure health is prioritized over workload health, the agent is
263
+
infrastructure health is prioritized over workload health, WASP agent is
267
264
reconfiguring the `system.slice` and setting `memory.swap.max=0` to
268
-
prevent any system service within from swapping.
265
+
prevent any system service from swapping.
269
266
270
267
###### Preventing SWAP traffic I/O saturation
271
268
@@ -274,7 +271,7 @@ potentially preventing other processes from performing I/O.
274
271
275
272
In order to ensure that system services are able to perform I/O, the
276
273
agent is configuring `io.latency=50` for the `system.slice` in order
277
-
to ensure that it's I/O requests are prioritized over any other slice.
274
+
to ensure that its I/O requests are prioritized over any other slice.
278
275
This is, because by default, no other slice is configured to have
279
276
`io.latency` set.
280
277
@@ -393,8 +390,19 @@ None.
393
390
394
391
## Test Plan
395
392
396
-
Add e2e tests for the WASP agent repository for regression testing against
397
-
OpenShift.
393
+
The cluster under test has worker nodes with identical amount of RAM and disk size.
394
+
Memory overcommit is configured to 200%. There should be enough free space on the disk
395
+
in order to create the required file-based swap i.e. 8G of RAM and 200% overcommit require
396
+
at least 8G free space on the root disk.
397
+
398
+
* Fill the cluster with dormant VM's until each worker node is overcommited.
399
+
* Test the following scenarios:
400
+
* Node drain
401
+
* VM live-migration
402
+
* Cluster upgrade.
403
+
* The expectation is to see that nodes are stable
404
+
as well as the workloads.
405
+
398
406
399
407
## Graduation Criteria
400
408
@@ -433,48 +441,14 @@ object and the `openshift-cnv` namespace exist.
433
441
434
442
## Upgrade / Downgrade Strategy
435
443
436
-
If applicable, how will the component be upgraded and downgraded? Make sure this
437
-
is in the test plan.
438
-
439
-
Consider the following in developing an upgrade/downgrade strategy for this
440
-
enhancement:
441
-
- What changes (in invocations, configurations, API use, etc.) is an existing
442
-
cluster required to make on upgrade in order to keep previous behavior?
443
-
- What changes (in invocations, configurations, API use, etc.) is an existing
444
-
cluster required to make on upgrade in order to make use of the enhancement?
444
+
On OpenShift level no specific action needed, since all of the APIs used
445
+
by the WASP agent deliverables are stable (DaemonSet, OCI Hook, MachineConfig, KubeletConfig)
445
446
446
447
Upgrade expectations:
447
-
- Each component should remain available for user requests and
448
-
workloads during upgrades. Ensure the components leverage best practices in handling [voluntary
449
-
disruption](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/). Any exception to
450
-
this should be identified and discussed here.
451
-
- Micro version upgrades - users should be able to skip forward versions within a
452
-
minor release stream without being required to pass through intermediate
453
-
versions - i.e. `x.y.N->x.y.N+2` should work without requiring `x.y.N->x.y.N+1`
454
-
as an intermediate step.
455
-
- Minor version upgrades - you only need to support `x.N->x.N+1` upgrade
456
-
steps. So, for example, it is acceptable to require a user running 4.3 to
457
-
upgrade to 4.5 with a `4.3->4.4` step followed by a `4.4->4.5` step.
458
-
- While an upgrade is in progress, new component versions should
459
-
continue to operate correctly in concert with older component
460
-
versions (aka "version skew"). For example, if a node is down, and
461
-
an operator is rolling out a daemonset, the old and new daemonset
462
-
pods must continue to work correctly even while the cluster remains
463
-
in this partially upgraded state for some time.
448
+
- WASP from CNV-X.Y must work with OCP-X.Y as well as OCP-X.(Y+1)
464
449
465
450
Downgrade expectations:
466
-
- If an `N->N+1` upgrade fails mid-way through, or if the `N+1` cluster is
467
-
misbehaving, it should be possible for the user to rollback to `N`. It is
468
-
acceptable to require some documented manual steps in order to fully restore
469
-
the downgraded cluster to its previous state. Examples of acceptable steps
470
-
include:
471
-
- Deleting any CVO-managed resources added by the new version. The
472
-
CVO does not currently delete resources that no longer exist in
473
-
the target version.
474
-
475
-
* On the cgroup level WASP agent supports only cgroups v2
476
-
* On OpenShift level no specific action needed, since all of the APIs used
477
-
by the WASP agent deliverables are stable (DaemonSet, OCI Hook, MachineConfig, KubeletConfig)
451
+
- WASP from CNV-X.Y must work with OCP-X.Y as well as OCP-X.(Y-1)
0 commit comments