split `ParamEnv::caller_bounds` into multiple fields #139576

davidtwco · 2025-04-09T10:55:55Z

This patch splits ParamEnv::caller_bounds into multiple fields, one for each ClauseKind. This is intended to be a performance optimisation, avoiding unnecessary iteration over large ParamEnvs when most kinds of clauses aren't relevant in any given context.

I make this change in three steps (reflected in the commits):

Changing the interface of ParamEnv
- Replace ParamEnv::caller_bounds with functions returning iterators over specific kinds of clause, and then updating all users. A ParamEnv::all_clauses function, identical in behaviour to ParamEnv::caller_bounds remains for the handful of cases where iteration over all clauses is desirable.
Interning ParamEnv
- In anticipation of changing the internals of ParamEnv, to avoid the type growing and to keep it Copy, ParamEnv is interned, becoming a pointer to ParamEnvInner.
Changing the internals of ParamEnv
- Replacing the caller_bounds field with a field for each type of clause and updating the creation of ParamEnv to partition a sequence of Clauses into the appropriate fields.

This is intended to help improve the performance of #137944, which results in many new predicates in the ParamEnv, and in particular, lots of new host effect predicates which can't be fast-path'd away. As part of that work, I've experimented with variations on this patch which had worse performance:

Uninterned ParamEnv
Box<[Clause<'tcx>]> fields rather than Clauses<'tcx>
Box<[SpecificPredicateType<'tcx>]>fields rather thanClauses<'tcx>`

r? types

Replace the `caller_bounds` function of `ParamEnv` with `ClauseKind`-specific accessors (i.e. `trait_clauses`, `region_outlives_clauses`) which will make it easier to later change the internals of `ParamEnv`.

Adding multiple fields to `ParamEnv` will grow the type, which will negatively impact performance, and it is widely assumed that `ParamEnv` is `Copy` - intern it prior to adding the additional fields.

Instead of filtering on `caller_bounds` at every use, instead filter them once when creating the `ParamEnv`. This will avoid unnecessary iteration over irrelevant bounds and improve performance.

rustbot · 2025-04-09T10:56:03Z

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Some changes occurred in compiler/rustc_passes/src/check_attr.rs

cc @jdonszelmann

Some changes occurred in compiler/rustc_sanitizers

cc @rcvalle

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Some changes occurred in compiler/rustc_monomorphize/src/partitioning/autodiff.rs

cc @ZuseZ4

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred in const_evaluatable.rs

cc @BoxyUwU

HIR ty lowering was modified

cc @fmease

fmease · 2025-04-09T11:36:35Z

@bors try @rust-timer queue

…r=<try> split `ParamEnv::caller_bounds` into multiple fields This patch splits `ParamEnv::caller_bounds` into multiple fields, one for each `ClauseKind`. This is intended to be a performance optimisation, avoiding unnecessary iteration over large `ParamEnv`s when most kinds of clauses aren't relevant in any given context. I make this change in three steps (reflected in the commits): 1. Changing the interface of `ParamEnv` - Replace `ParamEnv::caller_bounds` with functions returning iterators over specific kinds of clause, and then updating all users. A `ParamEnv::all_clauses` function, identical in behaviour to `ParamEnv::caller_bounds` remains for the handful of cases where iteration over all clauses is desirable. 2. Interning `ParamEnv` - In anticipation of changing the internals of `ParamEnv`, to avoid the type growing and to keep it `Copy`, `ParamEnv` is interned, becoming a pointer to `ParamEnvInner`. 3. Changing the internals of `ParamEnv` - Replacing the `caller_bounds` field with a field for each type of clause and updating the creation of `ParamEnv` to partition a sequence of `Clause`s into the appropriate fields. This is intended to help improve the performance of rust-lang#137944, which results in many new predicates in the `ParamEnv`, and in particular, lots of new host effect predicates which can't be fast-path'd away. As part of that work, I've experimented with variations on this patch which had worse performance: - Uninterned `ParamEnv` - `Box<[Clause<'tcx>]>` fields rather than `Clauses<'tcx>` - Box<[SpecificPredicateType<'tcx>]>` fields rather than `Clauses<'tcx>` r? types

bors · 2025-04-09T11:37:47Z

⌛ Trying commit 99aca75 with merge c8e16ec...

bjorn3 · 2025-04-09T11:47:03Z

compiler/rustc_middle/src/ty/mod.rs

+        tcx.mk_param_env(ParamEnvInner {
+            trait_clauses: ListWithCachedTypeInfo::empty(),
+            region_outlives_clauses: ListWithCachedTypeInfo::empty(),
+            type_outlives_clauses: ListWithCachedTypeInfo::empty(),
+            projection_clauses: ListWithCachedTypeInfo::empty(),
+            const_arg_has_type_clauses: ListWithCachedTypeInfo::empty(),
+            well_formed_clauses: ListWithCachedTypeInfo::empty(),
+            const_evaluatable_clauses: ListWithCachedTypeInfo::empty(),
+            host_effect_clauses: ListWithCachedTypeInfo::empty(),
+        })


Maybe cache this in tcx? Or would it be possible to put it in a static and insert this static in the param env interner such that you don't need to pass tcx around in many cases?

I've looked at doing this locally and it has negligible performance impact as far as I can measure, so I think this would only be worth doing if we wanted to avoid passing tcx in.

bors · 2025-04-09T13:44:37Z

☀️ Try build successful - checks-actions
Build commit: c8e16ec (c8e16eca87a239609ccf8c567e7df883b8a117de)

rust-timer · 2025-04-09T14:59:59Z

Finished benchmarking commit (c8e16ec): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.3%, 3.5%]	155
Regressions ❌ (secondary)	0.9%	[0.1%, 2.9%]	106
Improvements ✅ (primary)	-1.5%	[-1.9%, -1.1%]	2
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	1.0%	[-1.9%, 3.5%]	157

Max RSS (memory usage)

Results (primary 1.2%, secondary 1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.3%	[2.8%, 4.2%]	3
Regressions ❌ (secondary)	3.2%	[1.5%, 6.1%]	16
Improvements ✅ (primary)	-1.9%	[-2.3%, -1.5%]	2
Improvements ✅ (secondary)	-4.1%	[-6.2%, -2.9%]	4
All ❌✅ (primary)	1.2%	[-2.3%, 4.2%]	5

Cycles

Results (primary 1.1%, secondary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[0.8%, 2.7%]	35
Regressions ❌ (secondary)	2.0%	[1.4%, 2.5%]	17
Improvements ✅ (primary)	-2.8%	[-2.8%, -2.8%]	1
Improvements ✅ (secondary)	-5.6%	[-6.7%, -3.3%]	4
All ❌✅ (primary)	1.1%	[-2.8%, 2.7%]	36

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 778.182s -> 781.865s (0.47%)
Artifact size: 366.15 MiB -> 366.24 MiB (0.02%)

davidtwco · 2025-04-09T15:02:27Z

Interesting, a version of this patch helped with #137944 so I wonder why it's not great on its own (or if I've done something different when extracting this from that patch).

oli-obk · 2025-04-09T15:11:40Z

Local perf runs likely were run without incremental key hash verification, which is like 2/3rd of the regression here

davidtwco · 2025-04-11T08:46:18Z

Benchmarked this with #137944 and it didn't help there either, so closing.

davidtwco added 3 commits April 9, 2025 09:53

middle: ClauseKind-specific ParamEnv accessors

520ebcb

Replace the `caller_bounds` function of `ParamEnv` with `ClauseKind`-specific accessors (i.e. `trait_clauses`, `region_outlives_clauses`) which will make it easier to later change the internals of `ParamEnv`.

middle: intern ParamEnv

5a320ea

Adding multiple fields to `ParamEnv` will grow the type, which will negatively impact performance, and it is widely assumed that `ParamEnv` is `Copy` - intern it prior to adding the additional fields.

middle: split caller_bounds into multiple fields

99aca75

Instead of filtering on `caller_bounds` at every use, instead filter them once when creating the `ParamEnv`. This will avoid unnecessary iteration over irrelevant bounds and improve performance.

rustbot assigned oli-obk Apr 9, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 9, 2025

bjorn3 reviewed Apr 9, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 9, 2025

davidtwco added a commit to davidtwco/rust that referenced this pull request Apr 10, 2025

upstream: rust-lang#139576

15df69b

davidtwco closed this Apr 11, 2025

davidtwco mentioned this pull request Apr 11, 2025

Sized Hierarchy: Part I #137944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split `ParamEnv::caller_bounds` into multiple fields #139576

split `ParamEnv::caller_bounds` into multiple fields #139576

davidtwco commented Apr 9, 2025 •

edited

Loading

rustbot commented Apr 9, 2025

fmease commented Apr 9, 2025

This comment has been minimized.

bors commented Apr 9, 2025

bjorn3 Apr 9, 2025

davidtwco Apr 9, 2025

bors commented Apr 9, 2025

This comment has been minimized.

rust-timer commented Apr 9, 2025

davidtwco commented Apr 9, 2025

oli-obk commented Apr 9, 2025

davidtwco commented Apr 11, 2025

split ParamEnv::caller_bounds into multiple fields #139576

split ParamEnv::caller_bounds into multiple fields #139576

Conversation

davidtwco commented Apr 9, 2025 • edited Loading

rustbot commented Apr 9, 2025

fmease commented Apr 9, 2025

This comment has been minimized.

bors commented Apr 9, 2025

bjorn3 Apr 9, 2025

Choose a reason for hiding this comment

davidtwco Apr 9, 2025

Choose a reason for hiding this comment

bors commented Apr 9, 2025

This comment has been minimized.

rust-timer commented Apr 9, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Instruction count

Max RSS (memory usage)

Cycles

Binary size

davidtwco commented Apr 9, 2025

oli-obk commented Apr 9, 2025

davidtwco commented Apr 11, 2025

split `ParamEnv::caller_bounds` into multiple fields #139576

split `ParamEnv::caller_bounds` into multiple fields #139576

davidtwco commented Apr 9, 2025 •

edited

Loading