Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexOfSentinel SIMD usage before FP and MMU are enabled #21451

Closed
gaosui opened this issue Sep 19, 2024 · 2 comments
Closed

indexOfSentinel SIMD usage before FP and MMU are enabled #21451

gaosui opened this issue Sep 19, 2024 · 2 comments
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@gaosui
Copy link

gaosui commented Sep 19, 2024

Zig Version

0.13.0

Steps to Reproduce and Observed Behavior

I noticed this issue when experimenting with arm64 bare metal programming, where indexOfSentinel emits SIMD operations before I have enabled FP and MMU on the target CPU.

  • Without FP enabled, attempting to use SIMD registers causes a CPU exception.
  • Without MMU and virtual memory, arm64 treats DRAM as device memory and generates an exception for unaligned access, which can happen when loading data into a large SIMD register like 128 bit Qn. The SIMD implementation of indexOfSentinel is also page-aware and assumes virtual memory.

The specific code path affected by this issue:

// Some null terminated string with arbitrary alignment.
const string: [*:0]const u8 = @ptrFromInt(...);

// format() eventually relies on indexOfSentinel to print sentinel terminated strings
std.fmt.format(someWriter, "{s}\n", string);

My exact build target:

.{
    .cpu_arch = .aarch64,
    .cpu_model = .{ .explicit = &std.Target.arm.cpu.cortex_a76 },
    .os_tag = .freestanding,
    .abi = .none,
}

Code executed on qemu with:

qemu-system-aarch64 -machine virt -cpu cortex-a76 -nographic ...

Expected Behavior

I certainly expected format() to print out the string instead of triggering a CPU exception.

This is hardly a bug given the design goal of indexOfSentinel, and a workaround is obvious, just enable your FP and MMU eh?

Please suggest other solutions that I missed, such as compiler flags to control SIMD usage, or even better, runtime instrument to generate different code for indexOfSentinel before and after FP is enabled at runtime.

Off topic but I find zig tremendously expressive and pleasant to use. I actually switched from C to zig in search of a stronger type system, but got more out of it. Hope the finding in the bug is relevant, and thanks zig team for the awesome work!

@gaosui gaosui added the bug Observed behavior contradicts documented or intended behavior label Sep 19, 2024
@alexrp
Copy link
Contributor

alexrp commented Sep 19, 2024

Try -mcpu cortex_a76-neon-fullfp16 or similar. That should prevent the compiler from emitting FP/SIMD code altogether, which is usually what you want for bare metal / kernel development. If you do want FP/SIMD code generation enabled, but just not before a certain point, then I think the responsibility falls on you to avoid FP/SIMD code until that point.

@daurnimator
Copy link
Contributor

daurnimator commented Sep 19, 2024

You should be able to selectively disable CPU features.

My exact build target:

.{
    .cpu_arch = .aarch64,
    .cpu_model = .{ .explicit = &std.Target.arm.cpu.cortex_a76 },
    .os_tag = .freestanding,
    .abi = .none,
}

Try:

.{
  .cpu_arch = .aarch64,
  .cpu_model = .{ .explicit = &std.Target.arm.cpu.cortex_a76 },
  .cpu_features_sub = std.Target.aarch64.featureSet(&.{ .neon }), // disable emitting neon instructions
  .os_tag = .freestanding,
  .abi = .none,
}

@gaosui gaosui closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

3 participants