Explore Wasmtime as an alternative WebAssembly runtime #458

ia0 · 2024-05-07T12:17:15Z

Now that Wasmtime has no-std support, it becomes a possible alternative for the platform WASM runtime. This task should track the feasibility of using Wasmtime, since many roadblocks are expected (page size, memory and binary footprint, supported target architectures, releasing control flow, etc).

In particular, we should try to use Pulley.

ia0 · 2025-02-18T12:26:58Z

There was recent developments on bytecodealliance/wasmtime#7311. I tried to use Pulley on Nordic on the wasm-bench crate (see #753). It seems the generated Pulley bytecode is 34 times larger than the Wasm bytecode (it's an ELF file). Besides, it seems Wasmtime needs to copy it to RAM, which is another issue.

tschneidereit · 2025-02-19T20:52:11Z

Thank you for starting a conversation about this in the BA's Zulip, @ia0! <3 As @alexcrichton said over there, we'll gladly help out with making Wasmtime a viable option however we can.

Alex already filed two issues[1, 2], which should address the issues with Pulley bytecode size, and having to have the bytecode in RAM.

Besides that, Alex also mentioned being able to reduce the size of the runtime itself, by removing the dependency on Serde. We know that there are other ways to shrink the binary size, but perhaps the biggest one might come from disabling a feature: SIMD support incurs a substantial size increase, because of how many opcodes need to be handled. Disabling that should shrink the interpreter meaningfully.

ia0 · 2025-02-20T18:21:57Z

Thank you and Alex for the quick answers and follow-up!

Let me describe how WebAssembly is used in Wasefire and answer Alex's questions:

I realize that this may be a bit of a stretch, but if you're able to describe what your embedding does (or even better have a fork/project that can be built)

Wasefire provides 2 APIs (the board API and the applet API) and a "scheduler" sitting between both.
The board API is a hardware abstraction in Rust (board implementations and the scheduler are written in Rust). This API simplifies support for new embedded devices by just having to implement this API.
The applet API is a system abstraction in WebAssembly (applets are Wasm modules and the scheduler is a Wasm runtime). This API provides applet portability across different embedded devices, think Embedded WASI except it's custom for now (API using Component Model or Embedded WASI #63).
The scheduler is meant to run multiple applets concurrently, although currently only one applet can be installed (and executed) at a time. However, applets can be installed or updated dynamically (through a custom USB protocol).
After considering Wasmtime, Wasmer, Wasmi, and Wasm3, I decided to write my own in-place interpreter¹. It differentiates itself from the rest with small binary size, small memory footprint, slow interpretation, returns control flow for host functions, and default function linking.
There's also the option to link one native applet to the scheduler (bypassing WebAssembly). This is the another extreme in the design space (applet performance, applet sandboxing). Ideally Wasmtime would provide yet another point in the design space (applet performance/sandboxing, scheduler flash and RAM footprint and limited applet binary portability).
I'm doing Wasm runtime experiments in crates/wasm-bench. This benchmark uses the minimal CoreMark from Wasm3, and it's really just to get orders of magnitude (or even answer feasibility questions).

We know that there are other ways to shrink the binary size, but perhaps the biggest one might come from disabling a feature

I always use default-features = false and enable only what I use, so I'm already expecting to use the minimum set of Wasmtime features.

and/or describe what the wasm is doing (or even better share a sample wasm) that'd be awesome.

Applets use less than the WebAssembly MVP. The current interpreter doesn't even support SIMD. It also has optional floats (disabled by default). If you want to check some actual wasm modules, you can run cargo xtask applet rust NAME where NAME is the name of the applet and is just a crate at path examples/rust/NAME/Cargo.toml. This will produce target/wasefire/applet.wasm (and applet.wasm.orig in the same directory before wasm-strip and wasm-opt). The biggest example so far is opensk. Note that you can also use cargo xtask --release applet rust opensk (to remove debug printing support) or cargo xtask --release applet rust opensk --opt-level=z to optimize for size.

I'm currently on vacations (with the kid thus little time), but as soon as I'm back I'll try to see if I can add Wasmtime support behind a cargo feature. The main difficulty will be the fact that the scheduler currently assumes the runtime to return control flow on host function calls. I guess I'll be able to use the async API of Wasmtime for this purpose (without async runtime, just calling poll myself). Another will be the fact that the current interpreter accepts a way to always link imported functions, but that's only to support linking new applets on old platforms as long as the imported function is allowed to return an error (there's a common format for all functions) at runtime. That's probably not going to be a blocker.

I'll post updates on this issue.

I later discovered Ben Titzer's paper A fast in-place interpreter for WebAssembly which ideas are currently being implemented in the dev/fast-interp branch. ↩

alexcrichton · 2025-02-24T21:37:20Z

A bit delayed, but thank you for writing that up! It'll take some time to fully digest this but I hope to poke at this in the future.

In the meantime bytecodealliance/wasmtime#10285 triggered another thought/recommendation, you'll want to be sure to set Config::generate_address_map to false if you aren't already. That should ~halve the size of the *.cwasm and means that you'll lose the ability to get wasm bytecode offsets in backtraces, which I suspect is probably suitable for your use case. (although if it's not there's some assorted ideas on bytecodealliance/wasmtime#3547 for making this section smaller)

Also, to confirm but I suspect you're already doing this, if you strip the binary before compiling it (e.g. remove the name section) it'll make the *.cwasm a bit smaller by removing that from the original binary. (or we could also plumb a Config option to retaining that in the *.cwasm if you'd prefer to not strip)

ia0 added needs:design Needs design to make progress for:usability Improves users (and maintainers) life labels May 7, 2024

zhouwfang mentioned this issue May 8, 2024

Interpreter performance and footprint #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore Wasmtime as an alternative WebAssembly runtime #458

Explore Wasmtime as an alternative WebAssembly runtime #458

ia0 commented May 7, 2024 •

edited

Loading

ia0 commented Feb 18, 2025 •

edited

Loading

tschneidereit commented Feb 19, 2025

ia0 commented Feb 20, 2025

alexcrichton commented Feb 24, 2025

Explore Wasmtime as an alternative WebAssembly runtime #458

Explore Wasmtime as an alternative WebAssembly runtime #458

Comments

ia0 commented May 7, 2024 • edited Loading

ia0 commented Feb 18, 2025 • edited Loading

tschneidereit commented Feb 19, 2025

ia0 commented Feb 20, 2025

Footnotes

alexcrichton commented Feb 24, 2025

ia0 commented May 7, 2024 •

edited

Loading

ia0 commented Feb 18, 2025 •

edited

Loading