Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft PR: Add Steal Time support in ARM #5139

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

DakshinD
Copy link
Contributor

@DakshinD DakshinD commented Apr 4, 2025

Opening this PR as a draft as we continue working on this issue to add steal time support for ARM. Please see below for the changes we have implemented so far.

Changes

  • Added PVTime struct in src/vmm/src/arch/aarch64/pvtime/mod.rs that allocates and stores per-vCPU stolen_time regions

    • It offers a new constructor to create the device (allocating system memory and storing regions per vcpu), and register_vcpu which attempts to use kvm_set_device_attr to register the corresponding IPA for a vcpu
  • Created a function setup_pv_time in src/vmm/src/builder.rs which is invoked in build_microvm_for_boot

    • Checks if PVTime is supported using kvm_has_device_attr
    • Creates a PVTime device
    • Iterates over vcpu fd's to register their IPAs for steal_time struct
  • Started using debug/info logs while booting microvm for testing

Current problems:

Future Progress

  • We hope to continue making progress on this PR by:
    • Getting feedback on our current design of the PVTime device and setup_pv_time
    • Support snapshotting
    • Any other things we need to address

Reason

Attempting to resolve issue #4866

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@DakshinD DakshinD marked this pull request as draft April 4, 2025 07:00
@Manciukic Manciukic self-assigned this Apr 4, 2025
@Manciukic Manciukic added the Status: WIP Indicates that an issue is currently being worked on or triaged label Apr 4, 2025
Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try making the proposed change to pass the right address and see if it works.
I haven't left any comment regarding style or design to focus on functionality first. Once we get something working we can see how to make it "look better".

Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused by the KVM API as well! Passing a user pointer to the IPA to kvm is ok, but it needs to be 64B aligned.

The hint was in the KVM documentation:

-EINVAL Base address not 64 byte aligned

https://www.kernel.org/doc/html/v5.8/virt/kvm/devices/vcpu.html#attribute-kvm-arm-vcpu-pvtime-ipa

@Manciukic Manciukic linked an issue Apr 4, 2025 that may be closed by this pull request
3 tasks
@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 4, 2025

Thank you so much for your fast response! Totally missed the docs for 64 byte alignment, but that ended up fixing the bugs and I can successfully boot a microVM manually. If you think its a good time now, we can look to flesh out the implementation and "make it better".

Other than that, we believe the next step is adding unit/integration tests to make sure our device is working properly, as well as implementing persistence across snapshots. Any leads on how to do both of these things would be really helpful.

The lead we have been looking at for snapshotting is the Persist interface.

  • We store the MicrovmState in save_snapshot, so we need to first store our PVTime device somewhere in the vmm
  • Then we implement Persist for PVTime, and store the PVTimeState we get from save() somewhere in the MicrovmState (could we extend MicrovmState to include this? or add this state to one of the other states already existing inside of it?)
  • on booting from snapshot, we need to setup the pvtime device again, using kvm_set_device_addr again because the vcpu fds change.
  • Finally, would need to write comprehensive tests to make sure this works

@Manciukic
Copy link
Contributor

Thank you so much for your fast response! Totally missed the docs for 64 byte alignment, but that ended up fixing the bugs and I can successfully boot a microVM manually. If you think its a good time now, we can look to flesh out the implementation and "make it better".

Well done! I'll take a look.

The lead we have been looking at for snapshotting is the Persist interface.

* We store the `MicrovmState` in `save_snapshot`, so we need to first store our PVTime device somewhere in the vmm

* Then we implement `Persist` for PVTime, and store the `PVTimeState` we get from `save()` somewhere in the `MicrovmState` (could we extend MicrovmState to include this? or add this state to one of the other states already existing inside of it?)

I think extending it is ok, I didn't find any other place where we could fit it.

* on booting from snapshot, we need to setup the pvtime device again, using `kvm_set_device_addr` again because the vcpu fds change.

Correct. Remember to use the resource allocator with exact policy to ensure the system memory gets reserved in our internal allocator.

* Finally, would need to write comprehensive tests to make sure this works

I see the kernel prints a message "using stolen time PV" in the console when it boots, so we could probably check for that.

A more in-depth test to actually check that it's working would be to start 2 cpu hogging workloads on a 2vcpu guest and pin both vcpus to the same pcpu. That should achieve 50% steal time, and increase the relative counter in /proc/stat. We could assert the value actually increased.

For the snapshot, we could check that the same (non-zero) value is present after resume and that it continues to increase if we hog it again.

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 8, 2025

Hi @Manciukic, I was just about to comment :), thanks for your response. We've started a bare bones implementation of snapshot persistence in the previous commit. So if you also have the time to take a look at it, that would be awesome.

We are now trying to change the existing tests to include PVTime. Any input on our design choices we made would be appreciated again, specifically:

  • Using either ONLY the base_addr or the hashmap in PVTime struct (we believe only the base_addr is necessary if the order of the vCPUs does not change across snapshots)
  • Our restoration of the device inside build_microvm_from_snapshot
  • Choosing to make the device an Option<PVTime inside the struct Vmm, and Option<PVTimeState> inside struct MicrovmState

Copy link
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I'd just move most of the pv_time specific logic into the new module to avoid cluttering the builder.rs even more.

We also need to bump the snapshot version and add a changelog entry, other than the integration tests.

@Manciukic
Copy link
Contributor

Manciukic commented Apr 8, 2025

I tried to run the branch through our CI and there are a couple of issues: https://buildkite.com/firecracker/firecracker-pr/builds/13103#

  • x86 doesn't compile, for example:
E   error[E0433]: failed to resolve: could not find `aarch64` in `arch`
E     --> src/vmm/src/builder.rs:22:18
E      |
E   22 | use crate::arch::aarch64::pvtime::{PVTime, PVTimeConstructorArgs, PVTimeError};
E      |                  ^^^^^^^ could not find `aarch64` in `arch`
E      |
E   note: found an item that was configured out
  • arm has a few failures
integration_tests/build/test_clippy.py::test_rust_clippy[aarch64-unknown-linux-gnu] FAILED                                                               [ 12%]
integration_tests/build/test_clippy.py::test_rust_clippy[aarch64-unknown-linux-musl] FAILED                                                              [ 25%]
integration_tests/build/test_coverage.py::test_coverage FAILED                                                                                           [ 37%]
integration_tests/build/test_dependencies.py::test_unused_dependencies PASSED                                                                            [ 50%]
integration_tests/build/test_gdb.py::test_gdb_compiles PASSED                                                                                            [ 62%]
integration_tests/build/test_seccomp_no_redundant_rules.py::test_redundant_seccomp_rules PASSED                                                          [ 75%]
integration_tests/build/test_unittests.py::test_unittests FAILED                                                                                         [ 87%]
integration_tests/build/test_unittests.py::test_benchmarks_compile PASSED                                                                                [100%]
  • unit tests need to be updated with the new field
  • clippy is unhappy

But the good news is functional and performance tests passed 🎉

To run these tests locally you can run:

./tools/devtool test -y -- integration_tests/build

@DakshinD
Copy link
Contributor Author

DakshinD commented Apr 9, 2025

Managed to fix the errors reported in the build as well as clippy errors, would be awesome if it could be re-run.

We started looking at unit/integration tests, as per your suggestions, and were hoping to get a small lead as to going about it.

  • For the unit tests in src/vmm/src/builder.rs as well as test_microvm_state_snapshot, we couldn't find a way to get the Vec<Vcpus> in context to setup the PVTime device. Is it necessary to mimic these existing tests for the pvtime device?

Copy link

codecov bot commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 91.33858% with 11 lines in your changes missing coverage. Please review.

Project coverage is 79.53%. Comparing base (f6fa622) to head (cf0e384).
Report is 48 commits behind head on main.

Current head cf0e384 differs from pull request most recent head ef99ee9

Please upload reports for the commit ef99ee9 to get more accurate results.

Files with missing lines Patch % Lines
src/vmm/src/builder.rs 72.22% 10 Missing ⚠️
src/vmm/src/arch/aarch64/pvtime.rs 98.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5139      +/-   ##
==========================================
- Coverage   83.06%   79.53%   -3.53%     
==========================================
  Files         250      219      -31     
  Lines       26876    23133    -3743     
==========================================
- Hits        22324    18399    -3925     
- Misses       4552     4734     +182     
Flag Coverage Δ
5.10-c5n.metal ?
5.10-m5n.metal ?
5.10-m6a.metal ?
5.10-m6g.metal 79.53% <91.33%> (+0.10%) ⬆️
5.10-m6i.metal ?
5.10-m7g.metal 79.53% <91.33%> (+0.10%) ⬆️
6.1-c5n.metal ?
6.1-m5n.metal ?
6.1-m6a.metal ?
6.1-m6g.metal 79.52% <91.33%> (+0.10%) ⬆️
6.1-m6i.metal ?
6.1-m7g.metal 79.53% <91.33%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Manciukic
Copy link
Contributor

Manciukic commented Apr 10, 2025

There are still a few style and compilation errors. You can run the tests from your development machine as:

./tools/devtool -y test  -- integration_tests/build/
./tools/devtool checkstyle

snippet:

error[E0412]: cannot find type `PVTimeError` in this scope
  --> src/vmm/src/builder.rs:88:18
   |
88 |     CreatePVTime(PVTimeError),
   |                  ^^^^^^^^^^^ not found in this scope
error[E0412]: cannot find type `PVTimeError` in this scope
   --> src/vmm/src/builder.rs:424:27
    |
424 |     RestorePVTime(#[from] PVTimeError),
    |                           ^^^^^^^^^^^ not found in this scope
error[E0412]: cannot find type `PVTimeError` in this scope
   --> src/vmm/src/builder.rs:424:27
    |
424 |     RestorePVTime(#[from] PVTimeError),
    |                           ^^^^^^^^^^^ not found in this scope
    |
help: you might be missing a type parameter
    |
424 |     RestorePVTime(#[from<PVTimeError>] PVTimeError),
    |                         +++++++++++++
warning: unused import: `log::warn`
  --> src/vmm/src/builder.rs:14:5
   |
14 | use log::warn;
   |     ^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default
For more information about this error, try `rustc --explain E0412`.
warning: `vmm` (lib) generated 1 warning
error: could not compile `vmm` (lib) due to 3 previous errors; 1 warning emitted
Returned error code: 101

https://buildkite.com/firecracker/firecracker-pr/builds/13116#_

@Manciukic
Copy link
Contributor

Manciukic commented Apr 10, 2025

For the unit tests in src/vmm/src/builder.rs as well as test_microvm_state_snapshot, we couldn't find a way to get the Vec<Vcpus> in context to setup the PVTime device. Is it necessary to mimic these existing tests for the pvtime device?

It's difficult to unit test code that mostly performs syscalls (ioctls on vcpu fd in this case), so the policy we adopted to not over-complicate the codebase with mocks is to test these features through end-to-end integration tests only.
I think it would be enough to have unit tests inside PVTime to ensure the logic (excluding the ioctls) is correct.
The unit tests in builder.rs are just to verify the creation of the emulated device, and most of the logic there is untested.

- Added PVTime struct that allocates and stores
  per-vCPU stolen_time regions
- Implemented register_vcpu() to register the IPA for
  each vCPU with KVM
- Invoked setup_pv_time in build_microvm_for_boot
- Added PVTime detection in setup_pv_time
- Started using debug/info logs while booting microvm for testing

Limitations/WIP:
- While manually booting a microvm for testing,
  error during set_device_attr in register_vcpu
- Snapshotting not yet supported

Signed-off-by: Dakshin Devanand <[email protected]>
- Each vCPU now gets 64 bytes for its steal_time struct region
 (16 bytes reguired for data)
- This satisfies 64 byte alignment the IPA requires

MicroVM can now successfully be booted with pvtime in the boot process.

Signed-off-by: Dakshin Devanand <[email protected]>
- Changes to PVTime design (use base addr)
- Secondary constructor for PVTime using only base addr
- Implement Persist interface for PVTime, save base addr
- Add pvtime to struct MicrovmState and struct Vmm
- Restore PVTime device in build_microvm_from_snapshot

Signed-off-by: Dakshin Devanand <[email protected]>
- Moved most PVTime logic into pvtime.rs
- Fixed includes for only aarch64
- PVTime only uses base_addr now
- Made clippy happy

Signed-off-by: Dakshin Devanand <[email protected]>
- General and snapshotting integration tests
- Unit tests for PVTime

Signed-off-by: Dakshin Devanand <[email protected]>
@DakshinD
Copy link
Contributor Author

Thanks for the clarification on that. We've added some very basic unit tests to PVTime and also 2 integration tests for basic checking and snapshotting. If there's any changes/next steps we need to take, please let us know.

A re-run of the test suite would also be very helpful so we can nail down any bugs, as it was a bit buggy locally.

@Manciukic
Copy link
Contributor

Thanks for the clarification on that. We've added some very basic unit tests to PVTime and also 2 integration tests for basic checking and snapshotting. If there's any changes/next steps we need to take, please let us know.

Can you also add a basic test that checks that the guest has enabled steal PV by looking at the kernel logs?

uvm.ssh.run("dmesg | grep 'stolen time PV'")

Make sure that only runs on aarch64.

A re-run of the test suite would also be very helpful so we can nail down any bugs, as it was a bit buggy locally.

Done. x86 has the same issues as above: https://buildkite.com/firecracker/firecracker-pr/builds/13138#0196241a-77dd-4fe4-9023-e053f8004c2d

- Fixed errors that occurred on x86 build
- Added integration test for checking steal time

Signed-off-by: Dakshin Devanand <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: WIP Indicates that an issue is currently being worked on or triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Steal Time support in ARM
2 participants