Stop tracing when we start unrolling. #1603

ltratt · 2025-02-13T14:59:04Z

Though it may not be obvious that this is what it's doing, this commit stops us tracing undesirable things, notably when we start unrolling inner loops.

Consider this example program:

for x in range(...):
  if x > ...:
    for y in range(...):

The outer loop can get hot before the inner loop, so we start tracing the outer loop -- which is fine. But when we trace, we may go around the inner loop an unbounded number of times, endlessly unrolling the inner loop. This can lead to gigantic, and "unstable" traces -- because we've unrolled a specific number of iterations, there's a very high chance that we'll see a different number of iterations next time and encounter a guard on the first "iteration" of the gigantic-loop. This then leads to us creating large numbers of essentially identical side-traces, hugely bloating the system. Using havlak as an example this (combined with a couple of other things) leads to ~10x more traces being created than we would expect.

To avoid this happening, we have to track what Locations we've seen while tracing. There are some potentially cunning things we could do here if/when performance becomes an issue, but for now we force every Location we see when tracing to become a HotTracing. That gives us a stable ID (the HotLocation's Mutex) which we can use as a reliable proxy for "have we seen this Location before?"

Though it may not be obvious that this is what it's doing, this commit stops us tracing undesirable things, notably when we start unrolling inner loops. Consider this example program: ```python for x in range(...): if x > ...: for y in range(...): ``` The outer loop can get hot before the inner loop, so we start tracing the outer loop -- which is fine. But when we trace, we may go around the inner loop an unbounded number of times, endlessly unrolling the inner loop. This can lead to gigantic, and "unstable" traces -- because we've unrolled a specific number of iterations, there's a very high chance that we'll see a different number of iterations next time and encounter a guard on the first "iteration" of the gigantic-loop. This then leads to us creating large numbers of essentially identical side-traces, hugely bloating the system. Using havlak as an example this (combined with a couple of other things) leads to ~10x more traces being created than we would expect. To avoid this happening, we have to track what `Location`s we've seen while tracing. There are some potentially cunning things we could do here if/when performance becomes an issue, but for now we force every `Location` we see when tracing to become a `HotTracing`. That gives us a stable ID (the `HotLocation`'s `Mutex`) which we can use as a reliable proxy for "have we seen this `Location` before?"

vext01 · 2025-02-13T15:17:31Z

ykrt/src/compile/jitc_yk/codegen/x64/deopt.rs

+                                4 => unsafe { ptr::write::<u32>(temp as *mut u32, jitval as u32) },
+                                8 => unsafe { ptr::write::<u64>(temp as *mut u64, jitval) },
+                                16 => {
+                                    // FIXME: This case is clearly not safe in general: it just so


I know this isn't really what we are doing here in this PR, but a casual reader will think "if the largest object we can handle is 64-bits, then why is this arm reachable". Did we get to the bottom of why this happens?

We didn't -- but, regrettably, the different paths we now trace hit this case. It sucks, and we need work out why we're being told to sometimes use a 16 byte register.

Suggestion: we let this go for now (as much as I dislike it) and I raise an issue as soon as the PR is merged to remind us to work out what's going on here.

I tend to agree. Perhaps raise an issue?

Yes, I suggest I do so if/when this merges, as I can then reference the commit and line number.

What's extra weird is that the jit value reports a size of 64-bit. I wonder how that mismatch happened.

I think it's something we should look at pretty pronto: it only crops up in a small number of traces, but it does indeed crop up.

vext01 · 2025-02-13T15:22:26Z

Trying to understand the high-level goal here. So you keep track of which locations you've already seen while tracing and if you see a one a second time, but which isn't the location that started tracing, we have unrolled an inner loop, and we abort? Is that the gist of this?

ltratt · 2025-02-13T15:22:44Z

Correct!

ptersilie · 2025-02-13T15:26:20Z

Do we not outline functions with loops in it already? Does this mean we observe this only on the main interpreter loop?

ltratt · 2025-02-13T15:27:16Z

"Don't outline functions with loops" is an LLVM-level thing (i.e. "don't outline C functions with loops in"). This PR is working at the upper-language (e.g. Lua) level.

vext01 · 2025-02-13T15:28:57Z

ykrt/src/mt.rs

+                                // having failed to trace properly.
+                                return TransitionControlPoint::AbortTracing;
+                            }
+                            if !seen_hls.insert(hl_ptr) {


There are two places you insert into the "seen" set. What is the difference between them?

It's due to the two representations of Counting: one without a HotLocation, one with.

OK, I've forgotten how that part of the system works, so I'll have to trust you on this one!

ptersilie · 2025-02-13T15:29:09Z

Oh right. Got you.

vext01 · 2025-02-13T15:29:31Z

ykrt/src/mt.rs

+                            else {
+                                panic!()
+                            };
+                            if frameaddr != *tracing_frameaddr {


Is this bit necessary, or something unrelated we are sneaking in?

Fair point: it's snuck in, because it was missing before, and I happened to notice it while working in this code.

[i.e. it should always have been present]

Do you want to keep it?

It should always have been in, and it directly affects this code, so yes.

sorry, we raced. All ok.

vext01 · 2025-02-13T15:34:32Z

This is at the edge of my understanding of mt.rs, but seems ok to me.

Perhaps add some more info to the commit message+PR description about why we are tracking what locations we've seen, as that wasn't immediately obvious to me (it allows us to detect back edges that aren't for the loop we are tracing).

ltratt · 2025-02-13T15:35:03Z

Maybe I'm too close to this, but the commit message feels to me like it explains this pretty thoroughly.

vext01 · 2025-02-13T15:35:44Z

I leave it up to you.

ltratt · 2025-02-13T15:36:43Z

I think it's good to go myself. It could always be improved, I suppose, but I feel it accurately describes the problem and the fix. And this is the first of (probably) a number of fixes we'll be making in this regard, so now we've all got some context as to the general problem, I think future PRs will be easier to grok.

ltratt assigned ptersilie and vext01 Feb 13, 2025

ltratt force-pushed the spot_inner_loops branch from e6651ca to 757e350 Compare February 13, 2025 15:04

vext01 reviewed Feb 13, 2025

View reviewed changes

vext01 added this pull request to the merge queue Feb 13, 2025

Merged via the queue into ykjit:master with commit 690567e Feb 13, 2025
2 checks passed

ltratt deleted the spot_inner_loops branch February 14, 2025 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop tracing when we start unrolling. #1603

Stop tracing when we start unrolling. #1603

ltratt commented Feb 13, 2025

vext01 Feb 13, 2025

ltratt Feb 13, 2025

vext01 Feb 13, 2025

ltratt Feb 13, 2025

ptersilie Feb 13, 2025

ltratt Feb 13, 2025

vext01 commented Feb 13, 2025

ltratt commented Feb 13, 2025

ptersilie commented Feb 13, 2025

ltratt commented Feb 13, 2025

vext01 Feb 13, 2025

ltratt Feb 13, 2025

vext01 Feb 13, 2025

ptersilie commented Feb 13, 2025

vext01 Feb 13, 2025

ltratt Feb 13, 2025

ltratt Feb 13, 2025

vext01 Feb 13, 2025

ltratt Feb 13, 2025

vext01 Feb 13, 2025

vext01 commented Feb 13, 2025

ltratt commented Feb 13, 2025

vext01 commented Feb 13, 2025

ltratt commented Feb 13, 2025

Stop tracing when we start unrolling. #1603

Stop tracing when we start unrolling. #1603

Conversation

ltratt commented Feb 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vext01 commented Feb 13, 2025

ltratt commented Feb 13, 2025

ptersilie commented Feb 13, 2025

ltratt commented Feb 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptersilie commented Feb 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vext01 commented Feb 13, 2025

ltratt commented Feb 13, 2025

vext01 commented Feb 13, 2025

ltratt commented Feb 13, 2025