Snapshottable API server cache #5017

serathius · 2024-12-30T16:06:32Z

PR with draft implementation kubernetes/kubernetes#128951

/cc @wojtek-t @deads2k @MadhavJivrajani @jpbetz

dims · 2024-12-30T16:18:01Z

cc @mengqiy @chaochn47 @shyamjvs

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

serathius · 2025-01-14T13:03:52Z

I run the scalability tests to measure overhead of clone. Scalability tests are a good as they don't use pagination nor exact request. I used kubernetes/kubernetes#126855 which clones the storage on each request. The results are good:

Overhead based on profiles collected during scalability tests:

Additional 7GB of object allocations, which accounts for 0.2% of allocations.
Additional 300MB of memory used, which accounts for 1.3% of memory used in scalability test.

The overhead is small enough that is within normal variance of memory usage during the test. The are some noticeable increases in request latency however I they are still far from SLO and could be due to high variance in results.

LIST pods with namespace 99%ile increased from 1s to 1.2s (within variance https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=namespace&Subresource=&Verb=LIST)
DELETE pods 99%ile increased from 170ms to 300ms https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=resource&Subresource=&Verb=DELETE
and some other single object operation have seen latency increase.

If we account for high variance of latency in scalability tests and look at profile differences only, we can estimate the expected overhead of keeping all store snapshots in the watchcache to be below 2% of memory.

wojtek-t · 2025-01-14T13:15:20Z

Are you looking at LoadResponsiveness_Prometheus or LoadResponsiveness_PrometheusSimple for latencies?
If you got 170ms for delete pods in base, it's probably the former, but it also has much higher variance.
What are is the comparison for the later?

https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=resource&Subresource=&Verb=DELETE
https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=pods&Scope=resource&Subresource=&Verb=DELETE

serathius · 2025-01-14T13:21:32Z

I looked at the LoadResponsiveness_Prometheus. For PrometheusSimple the latencies match aside of some anomalies like GET services, however they also seem very variadic in PrometheusSimple. https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=services&Scope=resource&Subresource=&Verb=GET

wojtek-t · 2025-01-14T13:56:06Z

I would focus on PrometheusSimple as something that is much more predictible/repeatable.
If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

deads2k · 2025-01-14T22:48:05Z

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

+
+No
+
+### Troubleshooting


How can we check in the field whether the response from the cache exactly matches the response from etcd?

If we enable just pagination requests, they could be checked by making an exact request. However the question is why should you care about this at all? Do you want to check if cache was corrupted? For that we should have an automated mechanism.

ping @deads2k

serathius · 2025-01-16T15:39:16Z

@wojtek-t

If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).

What small scale you have in mind. For me the scalability tests seem like a worst case scenario. They include large number of small objects with frequent updates. In this situation the overhead from B-tree structure should dominate the size of database.

serathius · 2025-02-05T10:37:20Z

ping @deads2k @wojtek-t

wojtek-t · 2025-02-11T13:51:24Z

What small scale you have in mind. For me the scalability tests seem like a worst case scenario. They include large number of small objects with frequent updates. In this situation the overhead from B-tree structure should dominate the size of database.

You're roughly right with the exception that I would like to see the impact for "high-throughput scalability test". Basically I would like to understand if we're not regressing on the throughput that you can achieve now.

deads2k

Don't forget to add the PRR metadata.

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md

…ctory name

deads2k · 2025-02-13T18:22:59Z

/tide merge-method-squash
/lgtm
/approve

k8s-ci-robot · 2025-02-13T18:23:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
~~keps/sig-api-machinery/OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

deads2k · 2025-02-13T19:48:39Z

/lgtm

* Snapshottable API server cache * Address deads2k feedback for KEP-4988 * [KEP-4988] Switch to etcd fallback, update feature gates, update directory name * [KEP-4988] Cleanup the proposal section * [KEP-4988] Move KEP to implementable

k8s-ci-robot requested review from deads2k, jpbetz, MadhavJivrajani and wojtek-t December 30, 2024 16:06

MadhavJivrajani reviewed Jan 2, 2025

View reviewed changes

wojtek-t reviewed Jan 10, 2025

View reviewed changes

wojtek-t self-assigned this Jan 10, 2025

jpbetz reviewed Jan 14, 2025

View reviewed changes

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md Outdated Show resolved Hide resolved

keps/sig-api-machinery/4988-serve-pagination-from-cache/README.md Outdated Show resolved Hide resolved