Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize critical_section: use MSR instruction #576

Merged
merged 6 commits into from
Jan 27, 2025

Conversation

sw
Copy link
Contributor

@sw sw commented Jan 20, 2025

(Marking as draft because I'm a Rust beginner who's probably doing things all wrong, and because it changes the API).

To avoid branches and unnecessary code in the acquire and release functions, simply write back the original value to the PRIMASK register using the MSR instruction. This reduces the number of cycles that the processor runs with interrupts disabled, improving interrupt latency.

Unfortunately due to how the critical-section dependency is implemented (extern "Rust"), there is still a branch to and return from the acquire and release functions.

Disassembly with --release:

Before - thumbv6m-none-eabi
<_critical_section_1_0_acquire>:
               	movs	r0, #0x1
               	mrs	r1, primask
               	ands	r1, r0
               	rsbs	r0, r1, #0
               	adcs	r0, r1
               	cpsid i
               	bx	lr

<_critical_section_1_0_release>:
               	cmp	r0, #0x0
               	beq	0x80002d6 <_critical_section_1_0_release+0x6> @ imm = #0x0
               	cpsie i
               	bx	lr
Before - thumbv7m-none-eabi
<_critical_section_1_0_acquire>:
               	mrs	r0, primask
               	movs	r1, #0x1
               	cpsid i
               	bic.w	r0, r1, r0
               	bx	lr

<_critical_section_1_0_release>:
               	cmp	r0, #0x0
               	it	eq
               	bxeq	lr
               	cpsie i
               	bx	lr

With this PR, the generated code is the same on thumbv6m and thumbv7m:

<_critical_section_1_0_acquire>:
               	mrs	r0, primask
               	cpsid i
               	bx	lr

<_critical_section_1_0_release>:
               	msr	primask, r0
               	bx	lr

This changes the critical-section dependency to use u32 as the RawRestoreState type instead of bool.

The register::primask module now defines a struct(pub u32) instead of an enum, and additionally a write function for use with the critical section. This is a breaking change to the API if you were using the enum values instead of the is_(in)active functions.

Is there a better way to achieve this without changing the API?

To avoid branches and unnecessary code in the acquire and release
functions, simply write back the original value to the PRIMASK register
using the MSR instruction.

This changes the critical-section dependency to use `u32` as the
`RawRestoreState` type instead of `bool`.

The `register::primask` module now defines a `struct(pub u32)` instead of an
`enum`, and additionally a `write` function for use with the critical
section.
@sw
Copy link
Contributor Author

sw commented Jan 21, 2025

Since the release function is now a single instruction consisting of 4 bytes, it should be possible for the linker to replace the branch with the MSR instruction. The Arm linker can apparently do that, but I wasn't able to get LLVM/LLD to do it.

@adamgreig
Copy link
Member

Thanks for this PR, it sounds like a good idea!

I think it would be worth avoiding the breaking change to PRIMASK so we can release this in the 0.7 version family. We could do this by adding a new interface to the primask module that deals with a u32, or (I think better) by having the critical section implementation read/write primask directly.

Just a reference for future readers: the current implementation masks just the bottom bit of the PRIMASK register to get the current interrupt status, stores that, then uses cpsie to selectively re-enable interrupts without ever writing back to PRIMASK. The proposed new version reads the whole PRIMASK register and writes it all back to restore state. The other 31 bits of PRIMASK are reserved and SBZP in ARMv6/7/8-M. Writing back a value read from the field is valid, so there shouldn't be any issue with this approach.

Did you investigate https://doc.rust-lang.org/rustc/linker-plugin-lto.html ?

@sw
Copy link
Contributor Author

sw commented Jan 22, 2025

I think it would be worth avoiding the breaking change to PRIMASK so we can release this in the 0.7 version family. We could do this by adding a new interface to the primask module that deals with a u32, or (I think better) by having the critical section implementation read/write primask directly.

Good point. I will revert the primask module and use asm! in critical_section.rs to read and write the PRIMASK register.

Did you investigate https://doc.rust-lang.org/rustc/linker-plugin-lto.html ?

There is a feature listed here, but I can't find out what it does:

linker-plugin-lto = []

Looks like some code in xtask/src/main.rs related to that was removed?

Anyway, thanks for pointing me to the right compiler flag. Adding "-C", "linker-plugin-lto" to rustflags in cargo.toml will make both acquire and release inline. Is this something the users of this crate will need to do manually? Does that need documentation? Or is there really code missing related to the crate feature I saw? I'm assuming that LTO would be beneficial for most users but it's not something this crate should force.

@adamgreig
Copy link
Member

That feature is from the days before stable inline assembly, when the cortex-m crate shipped a pre-built binary blob per target that would be linked in to provide the assembly calls. We also shipped a pre-built xLTO version which could be inlined at link time if the feature was enabled. Since moving to inline asm for all users, we don't ship any pre-built blobs, so the feature doesn't do anything, but we leave it in place for backwards compatibility in 0.7 releases.

I'm glad to hear it helps inline the c-s impl though. Yes, I think users will have to do this manually, so we should document it where we talk about the critical-section-single-core feature. There's no way to force it from our crate as far as I know.

@sw sw marked this pull request as ready for review January 22, 2025 18:51
@sw sw requested a review from a team as a code owner January 22, 2025 18:51
adamgreig
adamgreig previously approved these changes Jan 22, 2025
Copy link
Member

@adamgreig adamgreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

Do you want to add a note to the documentation for the critical-section-single-core feature here about enabling linker-plugin-lto for better performance?

/// Note that bits [31:1] are reserved and SBZP (Should-Be-Zero-or-Preserved)
#[cfg(cortex_m)]
#[inline]
pub fn write_raw(r: u32) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be unsafe because it can be used to enable interrupts, to match interrupt::enable(). If it was safe to call, user code could enable interrupts when other unsafe functions depend on them being disabled for memory safety, leading to UB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Do you require safety comments for the unsafe blocks? That didn't look like it is a policy in this crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for the unsafe blocks inside functions, but we should add a # Safety note to the docs for that function. I suggest:

# Safety

This method is unsafe as other unsafe code may rely on interrupts remaining disabled, for example during a critical section, and being able to safely re-enable them would lead to undefined behaviour. Do not call this function in a context where interrupts are expected to remain disabled - for example, in the midst of a critical section or `interrupt::free()` call.

It's a bit vague and hard to comply with, but I think that's the nature of the problem. Happy to hear other suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, I've added it.

Copy link
Member

@adamgreig adamgreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i'll see about backporting this to cortex-m 0.7 too.

@adamgreig adamgreig added this pull request to the merge queue Jan 27, 2025
Merged via the queue into rust-embedded:master with commit de61182 Jan 27, 2025
11 checks passed
@sw sw deleted the cs-msr branch January 28, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants