Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: IR Reference example triggers debug assert on x86_64 #10118

Open
iwanders opened this issue Jan 27, 2025 · 5 comments
Open

Cranelift: IR Reference example triggers debug assert on x86_64 #10118

iwanders opened this issue Jan 27, 2025 · 5 comments
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator

Comments

@iwanders
Copy link
Contributor

.clif Test Case

The Cranelift IR average example code from the docs/ir.md. I've already added that to the filetests test directory in caeafe2.

Using clif-util;

../target/debug/clif-util  bugpoint  filetests/filetests/isa/x64/average.clif x86_64
After pass 0, remaining insts/blocks: 22/3 (will keep reducing)
After pass 1, remaining insts/blocks: 13/3 (will keep reducing)
After pass 2, remaining insts/blocks: 13/3 (stop reducing)
████████████████████████████████████████████████████████████ pass 2 phase merge blocks                   2/   2 Remove unused global values
Crash message: assertion `left == right` failed
  left: types::I32
 right: types::I64

function %average() -> f32 system_v {
    ss0 = explicit_slot 8

block1:
    v0 = iconst.i32 0
    v1 = iconst.i32 0
    v5 = iconst.i32 0
    v6 = iadd v0, v5  ; v0 = 0, v5 = 0
    v9 = f64const 0.0
    v100 = f32const +NaN
    brif v1, block2, block5  ; v1 = 0

block2:
    v7 = load.f32 v6
    v8 = fpromote.f64 v7
    v10 = fadd v8, v9  ; v9 = 0.0
    stack_store v10, ss0
    trap user1

block5:
    return v100  ; v100 = +NaN
}

5 blocks 22 insts -> 3 blocks 13 insts

Manually, I reduced it to the following:

function %average(i32, i32) -> f32 system_v {
    ss0 = explicit_slot 8        ; Stack slot for `sum`

block1(v0: i32, v1: i32):
    v20 = f64const 0x0.0         ; Create a f64 as 0.0, just to initialise ss0.
    stack_store v20, ss0         ; Store f64 into the ss0 stack slot, just to initialise ss0.
    v6 = iadd v0, v1             ; Adds the input together as integers, output is i32?
    v7 = load.f32 v6             ; Converts v7 to f32 from i32.
    v8 = fpromote.f64 v7         ; converts v7 to f64 into v8.
    v9 = stack_load.f64 ss0      ; Loads ss0 from the stack into v9, v9 is now f64 that's 0.0
    v10 = fadd.f64 v8, v9        ; Adds v8 and v9 together, both are f64's, so v10 is an f64.
    stack_store v10, ss0         ; Write the 8 bytes back to ss0.
    v100 = f32const +NaN         ; Just creating a dummy to return.
    return v100                  ; Return the dummy.
}

Instructions themselves seem fine to me, but today is the first day I'm looking at Cranelift at all, so I absolutely may be missing something obvious.

Steps to Reproduce

  • Cherrypick caeafe2
  • cd cranelift
  • cargo t (note without --release, test will pass with --release).

Note, the problem is with x86_64, this function compiles fine with aarch64.

Expected Results

I would expect the example function provided in the documentation to compile, it compiles on aarch64, which (to me at least) indicates the function is fine. I would expect this to also compile on x86_64.

Actual Results

A debug assert triggers on this line;

Backtrace
---- filetests stdout ----
thread 'worker #7' panicked at cranelift/codegen/src/isa/x64/lower.rs:233:9:
assertion `left == right` failed
  left: types::I32
 right: types::I64
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5
   4: cranelift_codegen::isa::x64::lower::lower_to_amode
             at ./codegen/src/isa/x64/lower.rs:233:9
   5: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::sink_load
             at ./codegen/src/isa/x64/lower/isle.rs:313:20
   6: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_reg_mem
             at ./codegen/src/isa/x64/lower/isle.rs:148:23
   7: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_xmm_mem
             at ./codegen/src/isa/x64/lower/isle.rs:132:28
   8: cranelift_codegen::isa::x64::lower::isle::generated_code::constructor_lower
             at /home/ivor/Documents/Code/rust/wasmtime/wasmtime/target/debug/build/cranelift-codegen-8f3d42bba10fabf3/out/isle_x64.rs:21098:42
   9: cranelift_codegen::isa::x64::lower::isle::lower
             at ./codegen/src/isa/x64/lower/isle.rs:55:5
  10: cranelift_codegen::isa::x64::lower::<impl cranelift_codegen::machinst::lower::LowerBackend for cranelift_codegen::isa::x64::X64Backend>::lower
             at ./codegen/src/isa/x64/lower.rs:311:9
  11: cranelift_codegen::machinst::lower::Lower<I>::lower_clif_block
             at ./codegen/src/machinst/lower.rs:679:39
  12: cranelift_codegen::machinst::lower::Lower<I>::lower
             at ./codegen/src/machinst/lower.rs:1020:17
  13: cranelift_codegen::machinst::compile::compile
             at ./codegen/src/machinst/compile.rs:42:9
  14: cranelift_codegen::isa::x64::X64Backend::compile_vcode
             at ./codegen/src/isa/x64/mod.rs:62:9
  15: <cranelift_codegen::isa::x64::X64Backend as cranelift_codegen::isa::TargetIsa>::compile_function
             at ./codegen/src/isa/x64/mod.rs:74:40
  16: cranelift_codegen::context::Context::compile_stencil
             at ./codegen/src/context.rs:138:9
  17: cranelift_codegen::context::Context::compile
             at ./codegen/src/context.rs:204:23
  18: <cranelift_filetests::test_compile::TestCompile as cranelift_filetests::subtest::SubTest>::run
             at ./filetests/src/test_compile.rs:59:29
  19: cranelift_filetests::subtest::SubTest::run_target
             at ./filetests/src/subtest.rs:101:13
  20: cranelift_filetests::runone::run
             at ./filetests/src/runone.rs:97:9
  21: cranelift_filetests::concurrent::worker_thread::{{closure}}::{{closure}}
             at ./filetests/src/concurrent.rs:149:46
  22: std::panicking::try::do_call
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  23: __rust_try
  24: std::panicking::try
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  25: std::panic::catch_unwind
             at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  26: cranelift_filetests::concurrent::worker_thread::{{closure}}
             at ./filetests/src/concurrent.rs:149:30
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

The relevant section from isle_x64.rs;

&Opcode::Fpromote => {
    let v1 = C::first_result(ctx, arg0);
    if let Some(v2) = v1 {
        let v3 = C::value_type(ctx, v2);
        if v3 == F64 {
            let v1809 = constructor_xmm_zero(ctx, F64X2);
            let v1804 = &C::put_in_xmm_mem(ctx, v520);             // <- Line isle_x64.rs:21098:42
            let v1819 = constructor_x64_cvtss2sd(ctx, v1809, v1804);
            let v1820 = constructor_output_xmm(ctx, v1819);
            let v1821 = Some(v1820);
            // Rule at src/isa/x64/lower.isle line 2661.
            return v1821;
        }
    }
}

Which seems to indicate the problem is originating from Fpromote.

Versions and Environment

Cranelift version or commit: Current released; 0.116.1, also tested on current main; 5dfccc0.

Operating system: Ubuntu, stable rustc 1.84.0 (9fc6b4312 2025-01-07)

Architecture: x86_64

Extra Info

If I use a release build, target x86_64 and write the output to an object file using cranelift_object::ObjectModule. The provided symbol does work and correctly calculates averages.

Disassembly of object and usage of it to calculate average
$ objdump -d average.o

average.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <average>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 83 ec 10          	sub    $0x10,%rsp
   8:	66 0f 57 e4          	xorpd  %xmm4,%xmm4
   c:	48 8d 04 24          	lea    (%rsp),%rax
  10:	f2 0f 11 20          	movsd  %xmm4,(%rax)
  14:	85 f6                	test   %esi,%esi
  16:	0f 85 12 00 00 00    	jne    2e <average+0x2e>
  1c:	b9 00 00 c0 7f       	mov    $0x7fc00000,%ecx
  21:	66 0f 6e c1          	movd   %ecx,%xmm0
  25:	48 83 c4 10          	add    $0x10,%rsp
  29:	48 89 ec             	mov    %rbp,%rsp
  2c:	5d                   	pop    %rbp
  2d:	c3                   	retq   
  2e:	31 c9                	xor    %ecx,%ecx
  30:	44 6b c9 04          	imul   $0x4,%ecx,%r9d
  34:	66 0f 57 ed          	xorpd  %xmm5,%xmm5
  38:	f3 42 0f 5a 2c 0f    	cvtss2sd (%rdi,%r9,1),%xmm5
  3e:	4c 8d 0c 24          	lea    (%rsp),%r9
  42:	f2 41 0f 58 29       	addsd  (%r9),%xmm5
  47:	4c 8d 0c 24          	lea    (%rsp),%r9
  4b:	f2 41 0f 11 29       	movsd  %xmm5,(%r9)
  50:	83 c1 01             	add    $0x1,%ecx
  53:	39 f1                	cmp    %esi,%ecx
  55:	0f 82 d5 ff ff ff    	jb     30 <average+0x30>
  5b:	48 8d 3c 24          	lea    (%rsp),%rdi
  5f:	f2 0f 10 3f          	movsd  (%rdi),%xmm7
  63:	66 0f 57 e4          	xorpd  %xmm4,%xmm4
  67:	8b fe                	mov    %esi,%edi
  69:	f2 48 0f 2a e7       	cvtsi2sd %rdi,%xmm4
  6e:	f2 0f 5e fc          	divsd  %xmm4,%xmm7
  72:	0f 57 c0             	xorps  %xmm0,%xmm0
  75:	f2 0f 5a c7          	cvtsd2ss %xmm7,%xmm0
  79:	48 83 c4 10          	add    $0x10,%rsp
  7d:	48 89 ec             	mov    %rbp,%rsp
  80:	5d                   	pop    %rbp
  81:	c3                   	retq  
$ cat main.cpp 
#include <array>
#include <iostream>
extern "C" {
  float average(const float *array, size_t count);
}
int main(int, char**) {
  const auto values = std::array<float, 4>{1, 5555, 3, 4};
  const auto avg = average(values.data(), values.size());
  std::cout << "avg: " << avg << std::endl;
  return 0;
}
$ g++ main.cpp  average.o && ./a.out 
avg: 1390.75
@iwanders iwanders added bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator labels Jan 27, 2025
@cfallin
Copy link
Member

cfallin commented Jan 27, 2025

In this snippet

block1(v0: i32, v1: i32):
   v20 = f64const 0x0.0         ; Create a f64 as 0.0, just to initialise ss0.
   stack_store v20, ss0         ; Store f64 into the ss0 stack slot, just to initialise ss0.
   v6 = iadd v0, v1             ; Adds the input together as integers, output is i32?
   v7 = load.f32 v6             ; Converts v7 to f32 from i32.

you are loading from memory at address v6, but v6 is an I32-typed value (iadd is polymorphic so the type is implicit from v0 and v1 args, which are both i32). Note that x86-64 is a 64-bit architecture, so we require addresses to be 64 bits wide. That this worked on aarch64 is a little surprising but I guess the lowering rules are more lenient in that backend (they probably auto-extend).

(The comments aren't load-bearing for the bug but I note too that you wrote "Converts ... to f32 from i32" which is inconsistent with a load -- usually I would reach for an int-to-float opcode to convert in the usual sense.)

Our assertion failure message could absolutely be better here -- perhaps we could add "address with unexpected" or something like that.

I'll note as well that the example IR you took from ir.md is written without a target in mind, and address width is one of the few ways in which CLIF is not totally target-independent -- it must match the target ISA. I suspect that example was written long ago when riscv32 was the original target of proto-Cranelift.

Hopefully this is a simple fix (use i64 types for addresses) -- let us know if that addresses your problem!

@iwanders
Copy link
Contributor Author

@cfallin , thank you for the detailed answer.

Note that x86-64 is a 64-bit architecture, so we require addresses to be 64 bits wide.
Hopefully this is a simple fix (use i64 types for addresses) -- let us know if that addresses your problem!

Ah, I didn't think of that, but that does make sense!

I can indeed get the example from the documentation to work by changing v0, v1, v3 and v4 to i64. Would you like me to file a PR to change the documentation to use i64s for those variables? I expect most people who try to read through the docs and build something to use the cranelift_native crate and be on 64 bit architectures?

Our assertion failure message could absolutely be better here -- perhaps we could add "address with unexpected" or something like that.

Yeah, I think just something simple as changing that debug assert to be;

        debug_assert_eq!(ctx.output_ty(add, 0), types::I64, "address width of 64 expected, got 32");

would go a long way towards steering people in the right direction? I can file a pr with this change if you'd like me to. The discrepancy between debug and release builds is also a bit surprising to me, but that may be less of an easy fix as it probably requires adding an explicit check.

@cfallin
Copy link
Member

cfallin commented Jan 27, 2025

Yes, if you're willing, a PR to make both of those changes would be very much appreciated! Happy to review. Thanks!

@bjorn3
Copy link
Contributor

bjorn3 commented Jan 27, 2025

The discrepancy between debug and release builds is also a bit surprising to me, but that may be less of an easy fix as it probably requires adding an explicit check.

Does the clif ir verifier catch this mistake? You need to explicitly enable it as it is kinda slow.

@iwanders
Copy link
Contributor Author

Does the clif ir verifier catch this mistake? You need to explicitly enable it as it is kinda slow.

I tried adding the same average.clif file with an test verifier stanza at the top in e132b3b, this does not cause a unit test failure, so I don't think it does. I could not find a way to specify a test verifier_fail or something to indicate the verifier should report a failure in the test case, so I'm not sure how to really express this as a test.

My exact setup is available in 70af5ce on (branch issue-10118-reproduction), it adds a crate cranelift/compile_average_ir in the wasmtime repo for easy reproduction (function attempt_two in main.rs). This prints the flags as:

enable_alias_analysis = true
enable_verifier = true
enable_pcc = false

Which I take to mean the verifier is enabled, the documentation here also states it is enabled by default? Reproduction should be a matter of going to the crate and doing cargo r, that gives the assert, while cargo r --release shows the object file bytes.

Context; I'm considering creating a toy compiler, just to learn more about compilers. On Sunday I was exploring Cranelift to see how approachable it is and whether I'd like to use it as the frontend and share datastructures (effectively I think I would attempt making a TargetIsa). So that's why I ended up reading the IR documentation and tried to compile that function from the documentation. Overall though, the fact that I got working assembly within an hour or two of reading, experimenting, and in like 100 lines, makes it very approachable 👍

Yes, if you're willing, a PR to make both of those changes would be very much appreciated! Happy to review. Thanks!

Cool, I'll put something together.

iwanders added a commit to iwanders/wasmtime that referenced this issue Jan 28, 2025
…e#10118)

This adds additional information as to what is wrong when this assert is encountered.
iwanders added a commit to iwanders/wasmtime that referenced this issue Jan 28, 2025
…lliance#10118)

Most people using this example will likely be on 64 bit systems, so it makes
sense to target a system with 64 bit addressses with the example.
github-merge-queue bot pushed a commit that referenced this issue Jan 28, 2025
* Cranelift/x64 backend: improve pointer width assert. (#10118)

This adds additional information as to what is wrong when this assert is encountered.

* Cranelift/docs: Change pointer width in ir docs to 64 bits (#10118)

Most people using this example will likely be on 64 bit systems, so it makes
sense to target a system with 64 bit addressses with the example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

No branches or pull requests

3 participants