-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft PR: Add Steal Time support in ARM #5139
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try making the proposed change to pass the right address and see if it works.
I haven't left any comment regarding style or design to focus on functionality first. Once we get something working we can see how to make it "look better".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got confused by the KVM API as well! Passing a user pointer to the IPA to kvm is ok, but it needs to be 64B aligned.
The hint was in the KVM documentation:
-EINVAL Base address not 64 byte aligned
https://www.kernel.org/doc/html/v5.8/virt/kvm/devices/vcpu.html#attribute-kvm-arm-vcpu-pvtime-ipa
Thank you so much for your fast response! Totally missed the docs for 64 byte alignment, but that ended up fixing the bugs and I can successfully boot a microVM manually. If you think its a good time now, we can look to flesh out the implementation and "make it better". Other than that, we believe the next step is adding unit/integration tests to make sure our device is working properly, as well as implementing persistence across snapshots. Any leads on how to do both of these things would be really helpful. The lead we have been looking at for snapshotting is the
|
Well done! I'll take a look.
I think extending it is ok, I didn't find any other place where we could fit it.
Correct. Remember to use the resource allocator with exact policy to ensure the system memory gets reserved in our internal allocator.
I see the kernel prints a message "using stolen time PV" in the console when it boots, so we could probably check for that. A more in-depth test to actually check that it's working would be to start 2 cpu hogging workloads on a 2vcpu guest and pin both vcpus to the same pcpu. That should achieve 50% steal time, and increase the relative counter in /proc/stat. We could assert the value actually increased. For the snapshot, we could check that the same (non-zero) value is present after resume and that it continues to increase if we hog it again. |
Hi @Manciukic, I was just about to comment :), thanks for your response. We've started a bare bones implementation of snapshot persistence in the previous commit. So if you also have the time to take a look at it, that would be awesome. We are now trying to change the existing tests to include PVTime. Any input on our design choices we made would be appreciated again, specifically:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I'd just move most of the pv_time specific logic into the new module to avoid cluttering the builder.rs even more.
We also need to bump the snapshot version and add a changelog entry, other than the integration tests.
I tried to run the branch through our CI and there are a couple of issues: https://buildkite.com/firecracker/firecracker-pr/builds/13103#
But the good news is functional and performance tests passed 🎉 To run these tests locally you can run:
|
Managed to fix the errors reported in the build as well as clippy errors, would be awesome if it could be re-run. We started looking at unit/integration tests, as per your suggestions, and were hoping to get a small lead as to going about it.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5139 +/- ##
==========================================
- Coverage 83.06% 79.53% -3.53%
==========================================
Files 250 219 -31
Lines 26876 23133 -3743
==========================================
- Hits 22324 18399 -3925
- Misses 4552 4734 +182
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There are still a few style and compilation errors. You can run the tests from your development machine as:
snippet:
https://buildkite.com/firecracker/firecracker-pr/builds/13116#_ |
It's difficult to unit test code that mostly performs syscalls (ioctls on vcpu fd in this case), so the policy we adopted to not over-complicate the codebase with mocks is to test these features through end-to-end integration tests only. |
- Added PVTime struct that allocates and stores per-vCPU stolen_time regions - Implemented register_vcpu() to register the IPA for each vCPU with KVM - Invoked setup_pv_time in build_microvm_for_boot - Added PVTime detection in setup_pv_time - Started using debug/info logs while booting microvm for testing Limitations/WIP: - While manually booting a microvm for testing, error during set_device_attr in register_vcpu - Snapshotting not yet supported Signed-off-by: Dakshin Devanand <[email protected]>
- Each vCPU now gets 64 bytes for its steal_time struct region (16 bytes reguired for data) - This satisfies 64 byte alignment the IPA requires MicroVM can now successfully be booted with pvtime in the boot process. Signed-off-by: Dakshin Devanand <[email protected]>
- Changes to PVTime design (use base addr) - Secondary constructor for PVTime using only base addr - Implement Persist interface for PVTime, save base addr - Add pvtime to struct MicrovmState and struct Vmm - Restore PVTime device in build_microvm_from_snapshot Signed-off-by: Dakshin Devanand <[email protected]>
- Moved most PVTime logic into pvtime.rs - Fixed includes for only aarch64 - PVTime only uses base_addr now - Made clippy happy Signed-off-by: Dakshin Devanand <[email protected]>
- General and snapshotting integration tests - Unit tests for PVTime Signed-off-by: Dakshin Devanand <[email protected]>
Thanks for the clarification on that. We've added some very basic unit tests to PVTime and also 2 integration tests for basic checking and snapshotting. If there's any changes/next steps we need to take, please let us know. A re-run of the test suite would also be very helpful so we can nail down any bugs, as it was a bit buggy locally. |
Can you also add a basic test that checks that the guest has enabled steal PV by looking at the kernel logs?
Make sure that only runs on aarch64.
Done. x86 has the same issues as above: https://buildkite.com/firecracker/firecracker-pr/builds/13138#0196241a-77dd-4fe4-9023-e053f8004c2d |
- Fixed errors that occurred on x86 build - Added integration test for checking steal time Signed-off-by: Dakshin Devanand <[email protected]>
Opening this PR as a draft as we continue working on this issue to add steal time support for ARM. Please see below for the changes we have implemented so far.
Changes
Added PVTime struct in
src/vmm/src/arch/aarch64/pvtime/mod.rs
that allocates and stores per-vCPU stolen_time regionsnew
constructor to create the device (allocating system memory and storing regions per vcpu), andregister_vcpu
which attempts to usekvm_set_device_attr
to register the corresponding IPA for a vcpuCreated a function
setup_pv_time
insrc/vmm/src/builder.rs
which is invoked inbuild_microvm_for_boot
kvm_has_device_attr
Started using debug/info logs while booting microvm for testing
Current problems:
Future Progress
setup_pv_time
Reason
Attempting to resolve issue #4866
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.