Skip to content

split ParamEnv::caller_bounds into multiple fields #139576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

davidtwco
Copy link
Member

@davidtwco davidtwco commented Apr 9, 2025

This patch splits ParamEnv::caller_bounds into multiple fields, one for each ClauseKind. This is intended to be a performance optimisation, avoiding unnecessary iteration over large ParamEnvs when most kinds of clauses aren't relevant in any given context.

I make this change in three steps (reflected in the commits):

  1. Changing the interface of ParamEnv
    • Replace ParamEnv::caller_bounds with functions returning iterators over specific kinds of clause, and then updating all users. A ParamEnv::all_clauses function, identical in behaviour to ParamEnv::caller_bounds remains for the handful of cases where iteration over all clauses is desirable.
  2. Interning ParamEnv
    • In anticipation of changing the internals of ParamEnv, to avoid the type growing and to keep it Copy, ParamEnv is interned, becoming a pointer to ParamEnvInner.
  3. Changing the internals of ParamEnv
    • Replacing the caller_bounds field with a field for each type of clause and updating the creation of ParamEnv to partition a sequence of Clauses into the appropriate fields.

This is intended to help improve the performance of #137944, which results in many new predicates in the ParamEnv, and in particular, lots of new host effect predicates which can't be fast-path'd away. As part of that work, I've experimented with variations on this patch which had worse performance:

  • Uninterned ParamEnv
  • Box<[Clause<'tcx>]> fields rather than Clauses<'tcx>
  • Box<[SpecificPredicateType<'tcx>]>fields rather thanClauses<'tcx>`

r? types

Replace the `caller_bounds` function of `ParamEnv` with
`ClauseKind`-specific accessors (i.e. `trait_clauses`,
`region_outlives_clauses`) which will make it easier to later change
the internals of `ParamEnv`.
Adding multiple fields to `ParamEnv` will grow the type, which will
negatively impact performance, and it is widely assumed that `ParamEnv`
is `Copy` - intern it prior to adding the additional fields.
Instead of filtering on `caller_bounds` at every use, instead filter
them once when creating the `ParamEnv`. This will avoid unnecessary
iteration over irrelevant bounds and improve performance.
@rustbot rustbot added A-attributes Area: Attributes (`#[…]`, `#![…]`) F-autodiff `#![feature(autodiff)]` PG-exploit-mitigations Project group: Exploit mitigations S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver) labels Apr 9, 2025
@rustbot
Copy link
Collaborator

rustbot commented Apr 9, 2025

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Some changes occurred in compiler/rustc_passes/src/check_attr.rs

cc @jdonszelmann

Some changes occurred in compiler/rustc_sanitizers

cc @rcvalle

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Some changes occurred in compiler/rustc_monomorphize/src/partitioning/autodiff.rs

cc @ZuseZ4

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred in const_evaluatable.rs

cc @BoxyUwU

HIR ty lowering was modified

cc @fmease

@fmease
Copy link
Member

fmease commented Apr 9, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 9, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 9, 2025
…r=<try>

split `ParamEnv::caller_bounds` into multiple fields

This patch splits `ParamEnv::caller_bounds` into multiple fields, one for each `ClauseKind`. This is intended to be a performance optimisation, avoiding unnecessary iteration over large `ParamEnv`s when most kinds of clauses aren't relevant in any given context.

I make this change in three steps (reflected in the commits):

1. Changing the interface of `ParamEnv`
    - Replace `ParamEnv::caller_bounds` with functions returning iterators over specific kinds of clause, and then updating all users. A `ParamEnv::all_clauses` function, identical in behaviour to `ParamEnv::caller_bounds` remains for the handful of cases where iteration over all clauses is desirable.
2. Interning `ParamEnv`
    - In anticipation of changing the internals of `ParamEnv`, to avoid the type growing and to keep it `Copy`, `ParamEnv` is interned, becoming a pointer to `ParamEnvInner`.
3. Changing the internals of `ParamEnv`
    - Replacing the `caller_bounds` field with a field for each type of clause and updating the creation of `ParamEnv` to partition a sequence of `Clause`s into the appropriate fields.

This is intended to help improve the performance of rust-lang#137944, which results in many new predicates in the `ParamEnv`, and in particular, lots of new host effect predicates which can't be fast-path'd away. As part of that work, I've experimented with variations on this patch which had worse performance:

- Uninterned `ParamEnv`
- `Box<[Clause<'tcx>]>` fields rather than `Clauses<'tcx>`
- Box<[SpecificPredicateType<'tcx>]>` fields rather than `Clauses<'tcx>`

r? types
@bors
Copy link
Collaborator

bors commented Apr 9, 2025

⌛ Trying commit 99aca75 with merge c8e16ec...

Comment on lines +1021 to +1030
tcx.mk_param_env(ParamEnvInner {
trait_clauses: ListWithCachedTypeInfo::empty(),
region_outlives_clauses: ListWithCachedTypeInfo::empty(),
type_outlives_clauses: ListWithCachedTypeInfo::empty(),
projection_clauses: ListWithCachedTypeInfo::empty(),
const_arg_has_type_clauses: ListWithCachedTypeInfo::empty(),
well_formed_clauses: ListWithCachedTypeInfo::empty(),
const_evaluatable_clauses: ListWithCachedTypeInfo::empty(),
host_effect_clauses: ListWithCachedTypeInfo::empty(),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cache this in tcx? Or would it be possible to put it in a static and insert this static in the param env interner such that you don't need to pass tcx around in many cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at doing this locally and it has negligible performance impact as far as I can measure, so I think this would only be worth doing if we wanted to avoid passing tcx in.

@bors
Copy link
Collaborator

bors commented Apr 9, 2025

☀️ Try build successful - checks-actions
Build commit: c8e16ec (c8e16eca87a239609ccf8c567e7df883b8a117de)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c8e16ec): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
1.0% [0.3%, 3.5%] 155
Regressions ❌
(secondary)
0.9% [0.1%, 2.9%] 106
Improvements ✅
(primary)
-1.5% [-1.9%, -1.1%] 2
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 1
All ❌✅ (primary) 1.0% [-1.9%, 3.5%] 157

Max RSS (memory usage)

Results (primary 1.2%, secondary 1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.3% [2.8%, 4.2%] 3
Regressions ❌
(secondary)
3.2% [1.5%, 6.1%] 16
Improvements ✅
(primary)
-1.9% [-2.3%, -1.5%] 2
Improvements ✅
(secondary)
-4.1% [-6.2%, -2.9%] 4
All ❌✅ (primary) 1.2% [-2.3%, 4.2%] 5

Cycles

Results (primary 1.1%, secondary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [0.8%, 2.7%] 35
Regressions ❌
(secondary)
2.0% [1.4%, 2.5%] 17
Improvements ✅
(primary)
-2.8% [-2.8%, -2.8%] 1
Improvements ✅
(secondary)
-5.6% [-6.7%, -3.3%] 4
All ❌✅ (primary) 1.1% [-2.8%, 2.7%] 36

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 778.182s -> 781.865s (0.47%)
Artifact size: 366.15 MiB -> 366.24 MiB (0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 9, 2025
@davidtwco
Copy link
Member Author

Interesting, a version of this patch helped with #137944 so I wonder why it's not great on its own (or if I've done something different when extracting this from that patch).

@oli-obk
Copy link
Contributor

oli-obk commented Apr 9, 2025

Local perf runs likely were run without incremental key hash verification, which is like 2/3rd of the regression here

davidtwco added a commit to davidtwco/rust that referenced this pull request Apr 10, 2025
@davidtwco
Copy link
Member Author

Benchmarked this with #137944 and it didn't help there either, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-attributes Area: Attributes (`#[…]`, `#![…]`) F-autodiff `#![feature(autodiff)]` perf-regression Performance regression. PG-exploit-mitigations Project group: Exploit mitigations S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants