Skip to content

Commit 6fc381f

Browse files
authored
Rollup merge of #136457 - calder:master, r=tgross35
Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang/rust#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang/rust#21690 * rust-lang/libs-team#532 * rust-lang/rust#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps try-job: x86_64-gnu-nopt try-job: x86_64-gnu-aux
2 parents e5b1927 + fc0bdc4 commit 6fc381f

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

Diff for: src/intrinsics/mod.rs

+4-4
Original file line numberDiff line numberDiff line change
@@ -411,9 +411,9 @@ pub trait EvalContextExt<'tcx>: crate::MiriInterpCxExt<'tcx> {
411411
};
412412
let res = this.binary_op(op, &a, &b)?;
413413
// `binary_op` already called `generate_nan` if needed.
414-
// Apply a relative error of 16ULP to simulate non-deterministic precision loss
414+
// Apply a relative error of 4ULP to simulate non-deterministic precision loss
415415
// due to optimizations.
416-
let res = apply_random_float_error_to_imm(this, res, 4 /* log2(16) */)?;
416+
let res = apply_random_float_error_to_imm(this, res, 2 /* log2(4) */)?;
417417
this.write_immediate(*res, dest)?;
418418
}
419419

@@ -464,9 +464,9 @@ pub trait EvalContextExt<'tcx>: crate::MiriInterpCxExt<'tcx> {
464464
if !float_finite(&res)? {
465465
throw_ub_format!("`{intrinsic_name}` intrinsic produced non-finite value as result");
466466
}
467-
// Apply a relative error of 16ULP to simulate non-deterministic precision loss
467+
// Apply a relative error of 4ULP to simulate non-deterministic precision loss
468468
// due to optimizations.
469-
let res = apply_random_float_error_to_imm(this, res, 4 /* log2(16) */)?;
469+
let res = apply_random_float_error_to_imm(this, res, 2 /* log2(4) */)?;
470470
this.write_immediate(*res, dest)?;
471471
}
472472

0 commit comments

Comments
 (0)