Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshottable API server cache #5017

Merged
merged 5 commits into from
Feb 13, 2025
Merged

Conversation

serathius
Copy link
Contributor

@serathius serathius commented Dec 30, 2024

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 30, 2024
@dims
Copy link
Member

dims commented Dec 30, 2024

cc @mengqiy @chaochn47 @shyamjvs

@wojtek-t wojtek-t self-assigned this Jan 10, 2025
@serathius
Copy link
Contributor Author

I run the scalability tests to measure overhead of clone. Scalability tests are a good as they don't use pagination nor exact request. I used kubernetes/kubernetes#126855 which clones the storage on each request. The results are good:

Overhead based on profiles collected during scalability tests:

  • Additional 7GB of object allocations, which accounts for 0.2% of allocations.
    image
  • Additional 300MB of memory used, which accounts for 1.3% of memory used in scalability test.
    image

The overhead is small enough that is within normal variance of memory usage during the test. The are some noticeable increases in request latency however I they are still far from SLO and could be due to high variance in results.

If we account for high variance of latency in scalability tests and look at profile differences only, we can estimate the expected overhead of keeping all store snapshots in the watchcache to be below 2% of memory.

@wojtek-t
Copy link
Member

Are you looking at LoadResponsiveness_Prometheus or LoadResponsiveness_PrometheusSimple for latencies?
If you got 170ms for delete pods in base, it's probably the former, but it also has much higher variance.
What are is the comparison for the later?

https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_Prometheus&Resource=pods&Scope=resource&Subresource=&Verb=DELETE
https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=pods&Scope=resource&Subresource=&Verb=DELETE

@serathius
Copy link
Contributor Author

I looked at the LoadResponsiveness_Prometheus. For PrometheusSimple the latencies match aside of some anomalies like GET services, however they also seem very variadic in PrometheusSimple. https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadResponsiveness_PrometheusSimple&Resource=services&Scope=resource&Subresource=&Verb=GET

@wojtek-t
Copy link
Member

I would focus on PrometheusSimple as something that is much more predictible/repeatable.
If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).


No

### Troubleshooting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we check in the field whether the response from the cache exactly matches the response from etcd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we enable just pagination requests, they could be checked by making an exact request. However the question is why should you care about this at all? Do you want to check if cache was corrupted? For that we should have an automated mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @deads2k

@serathius serathius force-pushed the kep-4988 branch 4 times, most recently from 9e39954 to 1538972 Compare January 16, 2025 15:11
@serathius
Copy link
Contributor Author

serathius commented Jan 16, 2025

@wojtek-t

If those match, and the overhead as you wrote is fairly small (I would be interested in observing how it looks also on small scale), then this solution is much preferable to me (even if in the first step we will only support pagination and nothing else).

What small scale you have in mind. For me the scalability tests seem like a worst case scenario. They include large number of small objects with frequent updates. In this situation the overhead from B-tree structure should dominate the size of database.

@serathius serathius changed the title Pagination from cache KEP Snapshottable API server cache Jan 16, 2025
@serathius
Copy link
Contributor Author

ping @deads2k @wojtek-t

@serathius serathius force-pushed the kep-4988 branch 2 times, most recently from f9fe2f0 to ee3e841 Compare February 7, 2025 22:07
@wojtek-t
Copy link
Member

What small scale you have in mind. For me the scalability tests seem like a worst case scenario. They include large number of small objects with frequent updates. In this situation the overhead from B-tree structure should dominate the size of database.

You're roughly right with the exception that I would like to see the impact for "high-throughput scalability test". Basically I would like to understand if we're not regressing on the throughput that you can achieve now.

Copy link
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add the PRR metadata.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Feb 13, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Feb 13, 2025
@deads2k
Copy link
Contributor

deads2k commented Feb 13, 2025

/tide merge-method-squash
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 13, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2025
@deads2k deads2k added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 13, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 13, 2025
@serathius serathius mentioned this pull request Feb 13, 2025
18 tasks
@deads2k
Copy link
Contributor

deads2k commented Feb 13, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 13, 2025
@k8s-ci-robot k8s-ci-robot merged commit 9d3935f into kubernetes:master Feb 13, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Feb 13, 2025
yliaog pushed a commit to yliaog/enhancements that referenced this pull request Mar 7, 2025
* Snapshottable API server cache

* Address deads2k feedback for KEP-4988

* [KEP-4988] Switch to etcd fallback, update feature gates, update directory name

* [KEP-4988] Cleanup the proposal section

* [KEP-4988] Move KEP to implementable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants