Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

winch(x64): Tail call #9732

Open
2 tasks
saulecabrera opened this issue Dec 4, 2024 · 3 comments
Open
2 tasks

winch(x64): Tail call #9732

saulecabrera opened this issue Dec 4, 2024 · 3 comments
Assignees
Labels
winch Winch issues or pull requests

Comments

@saulecabrera
Copy link
Member

saulecabrera commented Dec 4, 2024

The Wasm Tail Call proposal is considered Tier 1 according to Wasmtime's Tiers of support.

Winch currently doesn't support this proposal.

  • return_call
  • return_call_indirect
@saulecabrera saulecabrera converted this from a draft issue Dec 4, 2024
@saulecabrera saulecabrera moved this to Todo in Winch Dec 4, 2024
@saulecabrera saulecabrera added the winch Winch issues or pull requests label Dec 4, 2024
@saulecabrera saulecabrera changed the title winch: Tail call support winch(x64): Tail call support Jan 5, 2025
@saulecabrera saulecabrera changed the title winch(x64): Tail call support winch(x64): Tail call Jan 6, 2025
@saulecabrera saulecabrera self-assigned this Jan 17, 2025
@saulecabrera saulecabrera moved this from Todo to In Progress in Winch Jan 17, 2025
@saulecabrera saulecabrera moved this from In Progress to Todo in Winch Jan 17, 2025
@MarinPostma
Copy link
Contributor

@saulecabrera you can assign me to that 👍

@MarinPostma
Copy link
Contributor

@saulecabrera

Did you have a sketch of a plan on how to implement that already?

I did some homework this weekend, trying to find a path toward tail calls. I think we could do something like described in this post about v8 baseline compiler, or as initially implemented in cranelift, basically:

  • set up the frame as if we were about to perform a normal call
  • memcpy the frame down to override the caller frame
  • jump to the callee entry-point

However this is over simplistic. There are many things getting in our way. Here's my understanding of the situation, thus far:

  1. The callee and the caller may not require the same amount of stack space for their arguments. In the Winch calling convention, it is the caller that cleans up the stack space for the arguments. This is an issue for tail calls, because the caller doesn't know ahead of time how much stack space will need to be cleaned up when the callee returns. cranelift TailCall convention solves that by requiring that the callee cleans-up it's stack space. I suppose there are alternatives to let the caller know dynamically the amount of stack space to pop on cleanup, but in any case, it seems to me that we need a calling convention that supports that.
  2. The ret_area is inherited from the caller, rather and allocated in the caller stack. This is easy to do: we can get it from the caller's initial control frame. (see there). This works, because the callee must have the same return as the caller.
  3. We need to emit a jump to the callee, rather than a call, because we must not push a return address, since the state is already set up during the return_call preparation. Winch relies of cranelift to emit relocs for the callee address. The story around converting a function ref to a MachLabel that can be used for a jump is not entirely clear to me yet. Alternatively, we could pop the FP back into it's register and let call push it back (this is what v8 baseline does).
  4. Cranelift has Inst::CallReturn* macro-instruction, but it's not clear to me if we can use it, since it emits code specific to cranelift Tail calling convention.

I am glancing over a lot of things here, obviously. My principal concern is with 1) right now; I don't see a way forward without addressing it.

@saulecabrera
Copy link
Member Author

Some preliminary context:

  • Winch has a single, internal calling convention, which is mostly used between Wasm-to-Wasm calls.
  • Non Wasm-to-Wasm calls (e.g., calls to host-defined functions), go through
  • trampolines, which are currently emitted
    through Cranelift. These trampolines convert between different calling conventions.
  • Winch supports other calling conventions (e.g., SysV) used function calls
    which don't go through trampolines (calls to built-in function like
    f32.abs)

--

Regarding the implementation of tail-calls:

I don't have a one-size-fits-all answer unfortunately, but we did discuss some
ideas in the past, unfortunately they are not officially documented anywhere; in
the next couple of sentences I'll try to do a summary, taking into account your
points above:

I suppose there are alternatives to let the caller know dynamically the amount
of stack space to pop on cleanup, but in any case, it seems to me that we need
a calling convention that supports that.

Indeed, there have been attempts in other production ready Wasm
compilers
to avoid having
the callee clean up the stack space used for arguments. Given that (i) for tail
calls it's assumed that we're using the current frame to resolve calls and (ii)
even though that we don't know the exact space needed for arguments, there are
some expectations around the stack pointer position before/after the call, which
could be helpful to derive the correct stack pointer position in the presence of
tail calls and stack arguments. I was hoping to explore this idea, through
a prototype. The main benefit that I see here is minimizing ABI changes.

The alternative, as you note, is to have the callee clean up any stack space
allocated for stack arguments. Even though, this seems like the obvious
approach, I would not be comfortable introducing this change without a proper
exploration of the other alternatives; a change like this represents
a fundamental shift in the following fronts:

  • ABI
  • Frame handling
  • Multi-value

In either case, I don't think we need to introduce a different calling
convention, my expectation is that modifying Winch's default calling convention
should suffice.

The ret_area is inherited from the caller, rather and allocated in the caller
stack. This is easy to do: we can get it from the caller's initial control
frame. (see there). This works, because the callee must have the same return
as the caller.

In principle nothing fundamental should change for multiple return values in
the presence of tail calls. Note that if the callee cleans the space for stack
arguments, some updates will be needed to handle the stack return area.

We need to emit a jump to the callee, rather than a call, because we must not
push a return address, since the state is already set up during the
return_call preparation. Winch relies of cranelift to emit relocs for the
callee address. The story around converting a function ref to a MachLabel that
can be used for a jump is not entirely clear to me yet. Alternatively, we
could pop the FP back into it's register and let call push it back (this is
what v8 baseline does).

There are many assumptions in this statement, however, if I'm understanding this
statement correctly, I think that relocations and FP/return address handling are
two, largely orthogonal issues that we need to think about regardless.

--

In general, I agree that resolving how to handle stack arguments is a
pre-requisite for supporting tail-calls. My expectation is to take an informed
decision based on prototyping, given that there are many trade-offs to consider,
not only around the complexity of the potential ABI changes, but also the
impact of any new changes on:

  • Compilation performance
  • Runtime performance
  • Other non-x86-64 backends

After spending some time thinking a bit more about tail calls, and in
particular, with the last point above (non-x86-64 backends), I'd also recommend
waiting until we can fully run spec tests for aarch64. As I've been working on
trying to get them all passing I've found myself fixing some ABI-related bugs 1
to ensure that our generated code meets all the aarc64 ABI requirements.
I suspect that if we introduce any new compiler-wide ABI changes, before the
aarch64 backend is fully complete, there's a risk of conflicts down the road,
more concreely around the stack pointer handling and relocations side of things.

@saulecabrera saulecabrera moved this from Todo to In Progress in Winch Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
winch Winch issues or pull requests
Projects
Status: In Progress
Development

No branches or pull requests

2 participants