-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop tracing when we start unrolling. #1603
Conversation
Though it may not be obvious that this is what it's doing, this commit stops us tracing undesirable things, notably when we start unrolling inner loops. Consider this example program: ```python for x in range(...): if x > ...: for y in range(...): ``` The outer loop can get hot before the inner loop, so we start tracing the outer loop -- which is fine. But when we trace, we may go around the inner loop an unbounded number of times, endlessly unrolling the inner loop. This can lead to gigantic, and "unstable" traces -- because we've unrolled a specific number of iterations, there's a very high chance that we'll see a different number of iterations next time and encounter a guard on the first "iteration" of the gigantic-loop. This then leads to us creating large numbers of essentially identical side-traces, hugely bloating the system. Using havlak as an example this (combined with a couple of other things) leads to ~10x more traces being created than we would expect. To avoid this happening, we have to track what `Location`s we've seen while tracing. There are some potentially cunning things we could do here if/when performance becomes an issue, but for now we force every `Location` we see when tracing to become a `HotTracing`. That gives us a stable ID (the `HotLocation`'s `Mutex`) which we can use as a reliable proxy for "have we seen this `Location` before?"
e6651ca
to
757e350
Compare
4 => unsafe { ptr::write::<u32>(temp as *mut u32, jitval as u32) }, | ||
8 => unsafe { ptr::write::<u64>(temp as *mut u64, jitval) }, | ||
16 => { | ||
// FIXME: This case is clearly not safe in general: it just so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't really what we are doing here in this PR, but a casual reader will think "if the largest object we can handle is 64-bits, then why is this arm reachable". Did we get to the bottom of why this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't -- but, regrettably, the different paths we now trace hit this case. It sucks, and we need work out why we're being told to sometimes use a 16 byte register.
Suggestion: we let this go for now (as much as I dislike it) and I raise an issue as soon as the PR is merged to remind us to work out what's going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree. Perhaps raise an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suggest I do so if/when this merges, as I can then reference the commit and line number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's extra weird is that the jit value reports a size of 64-bit. I wonder how that mismatch happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's something we should look at pretty pronto: it only crops up in a small number of traces, but it does indeed crop up.
Trying to understand the high-level goal here. So you keep track of which locations you've already seen while tracing and if you see a one a second time, but which isn't the location that started tracing, we have unrolled an inner loop, and we abort? Is that the gist of this? |
Correct! |
Do we not outline functions with loops in it already? Does this mean we observe this only on the main interpreter loop? |
"Don't outline functions with loops" is an LLVM-level thing (i.e. "don't outline C functions with loops in"). This PR is working at the upper-language (e.g. Lua) level. |
// having failed to trace properly. | ||
return TransitionControlPoint::AbortTracing; | ||
} | ||
if !seen_hls.insert(hl_ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two places you insert into the "seen" set. What is the difference between them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's due to the two representations of Counting
: one without a HotLocation
, one with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've forgotten how that part of the system works, so I'll have to trust you on this one!
Oh right. Got you. |
else { | ||
panic!() | ||
}; | ||
if frameaddr != *tracing_frameaddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bit necessary, or something unrelated we are sneaking in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point: it's snuck in, because it was missing before, and I happened to notice it while working in this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[i.e. it should always have been present]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should always have been in, and it directly affects this code, so yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, we raced. All ok.
This is at the edge of my understanding of mt.rs, but seems ok to me. Perhaps add some more info to the commit message+PR description about why we are tracking what locations we've seen, as that wasn't immediately obvious to me (it allows us to detect back edges that aren't for the loop we are tracing). |
Maybe I'm too close to this, but the commit message feels to me like it explains this pretty thoroughly. |
I leave it up to you. |
I think it's good to go myself. It could always be improved, I suppose, but I feel it accurately describes the problem and the fix. And this is the first of (probably) a number of fixes we'll be making in this regard, so now we've all got some context as to the general problem, I think future PRs will be easier to grok. |
Though it may not be obvious that this is what it's doing, this commit stops us tracing undesirable things, notably when we start unrolling inner loops.
Consider this example program:
The outer loop can get hot before the inner loop, so we start tracing the outer loop -- which is fine. But when we trace, we may go around the inner loop an unbounded number of times, endlessly unrolling the inner loop. This can lead to gigantic, and "unstable" traces -- because we've unrolled a specific number of iterations, there's a very high chance that we'll see a different number of iterations next time and encounter a guard on the first "iteration" of the gigantic-loop. This then leads to us creating large numbers of essentially identical side-traces, hugely bloating the system. Using havlak as an example this (combined with a couple of other things) leads to ~10x more traces being created than we would expect.
To avoid this happening, we have to track what
Location
s we've seen while tracing. There are some potentially cunning things we could do here if/when performance becomes an issue, but for now we force everyLocation
we see when tracing to become aHotTracing
. That gives us a stable ID (theHotLocation
'sMutex
) which we can use as a reliable proxy for "have we seen thisLocation
before?"