Move wasm benchmarks to JS benchmark style scoring #18

kmiller68 · 2024-11-27T18:28:48Z

As discussed with other browsers the plan for JS3 is to score wasm the same way we score JS (with First Iteration, Average, and Worst N). Each iteration (other than the first) runs in about 60ms on my M4 Max MBP.

Move existing benchmarks to WasmLegacyBenchmark and add a new WasmEmccBenchmark that supports running iterations. It's currently an async benchmark subclass since the compile is async.

As a proof of concept this patch converts tsf-wasm to use the new scoring. I roughly expect the process to add a C++ library that doesn't already have its own wasm build script to be: compile with -s MODULARIZE=1 -s EXPORT_NAME=setupModule then add a benchmark.js with something like

class Benchmark {
    async runIteration() {
        if (!Module._runIteration)
            await setupModule(Module);

        Module._runIteration(150);
    }
};

I sampled the old test but recompiled and the new JS-style version of the test and the samples seemed quite similar. So it seems like this benchmark change doesn't radically alter the characteristics of at least this test.

Lastly, get emcc to ouput the .symbols object so we know which wasm function is which index. We could emit a name section into the binary but my assumption is that most production code does not ship with that so it might change the benchmark in an unrealistic way.

kmiller68 · 2024-12-18T17:36:50Z

See d1534f0 for details on the time spent in each function breakdowns. Weirdly even though the test runs significantly longer JSC still spends the same amount of time in our BBQ (template JIT) tier. Maybe that's a bug on our end though?

kmiller68 · 2024-12-19T15:23:09Z

@camillobruni @danleh Do either of you have thoughts/comments/concerns? I would CC the Mozilla folks too but I don't have their GH handles...

danleh · 2024-12-19T15:38:00Z

Sorry for the late reply. I'll take a look later today.

danleh · 2024-12-20T02:42:27Z

JetStreamDriver.js

@@ -36,11 +36,17 @@ globalThis.dumpJSONResults ??= false;
 globalThis.customTestList ??= [];

 let shouldReport = false;
+let startDelay;


That was simply pulled out into a variable and made configurable, right? And this is essentially independent of the Wasm change, correct? Just out of interest: Do you know what the original reasoning behind 4000ms delay, something like empirically observed loading time?

Yeah, this should be independent. The delay is because our internal A/B testing harness restarts the browser between full benchmark runs to reduce variance or caching. So we want to wait a few seconds for the browser to to calm down before starting the test, at least, IIRC.

All in on more configurable test params!

I'll probably copy the params.mjs from speedometer as well and then we have a more central place to store all the test parameters.

danleh · 2024-12-20T02:44:50Z

JetStreamDriver.js

@@ -330,7 +336,7 @@ class Driver {
            } else
                globalObject = runString("");

-            globalObject.console = { log: globalObject.print, warn: (e) => { print("Warn: " + e); /*$vm.abort();*/ } }
+            globalObject.console = { log: globalObject.print, warn: (e) => { print("Warn: " + e); /*$vm.abort();*/ }, error: (e) => { print("Error: " + e); /*$vm.abort();*/ } }


Optional nit: While we are at it, we could also add console.debug here, sqlite3-wasm uses that as well (and reformat to separate lines?)

danleh · 2024-12-20T02:51:24Z

wasm/TSF/tsf_ir_speed.c

    TIMEIT(writeTest(filename, 100, count, TSF_ZIP_NONE));
    TIMEIT(readTest(filename, count));
    TIMEIT(readMallocTest(filename, count));
    TIMEIT(readConvertTest(filename, count));
-
+
+    // TODO: tsf_zlib_supported is false so we should just remove this?


SGTM, also makes it more explicit/apparent what is part of the benchmark and what not.

danleh · 2024-12-20T02:53:01Z

wasm/TSF/tsf.js

WDYT about putting all compile outputs/generated files in a wasm/TSF/build/ directory, so that it's more clear what is generated by emscripten (and that it's not manually modified as in the old version of TSF-wasm).

And just to clarify: You did rebuild with a more up-to-date Emscripten version, right? So some changes in, say, execution profile, could simply come from toolchain evolution, right?

Yeah, the function numbers are completely different. Now we have the .symbols so at least you can manually map between emcc versions.

danleh · 2024-12-20T02:54:42Z

wasm/TSF/build.sh

 emcc \
-    -o tsf.html -o tsf.js -O2 -s WASM=1 -s TOTAL_MEMORY=52428800 -g1 \
+    -o tsf.js -O2 -s MODULARIZE=1 -s EXPORT_NAME=setupModule -s WASM=1 -s TOTAL_MEMORY=52428800 -g1 --emit-symbol-map -s EXPORTED_FUNCTIONS=_runIteration \


Much nicer with MODULARIZE, cool! :)

danleh · 2024-12-20T03:00:28Z

wasm/TSF/benchmark.js

+        if (!Module._runIteration)
+            await setupModule(Module);
+
+        Module._runIteration(150);


So just to clarify the workload size before vs. now: It used to be the default of count = 1000 because main was invoked without arguments, and now we are at defaultIterationCount = 120 times 150 per runIteration, right?

It seems the line item is running quite a bit longer now cf. before (~12s vs. ~1.7s, or roughly 130B dynamic instructions vs. 15B on an x64 workstation). I see more functions using the top-tier now, which is fine by me. In the old version, quite some time was spent in JS read/write functions and not in Wasm at all, which was certainly not ideal.

We just might have to lower the workload size, in case the overall benchmark becomes too long to run.

Excerpt of a bottom-up profile in d8 before:

vs. now:

I'm happy to land this, we should just be aware that this is now moving more into the direction of measuring peak performance. (Was this a conscious decision?)

Yeah, I should probably reduce the iteration count. I did that in my followup patches for other benchmarks, I'll do it here too. That said, it seems like your samples include the harness code? That's probably not ideal.

The function numbers also changed for me when I recompiled the test, I ended up just simulating the old benchmark by running one iteration with count = 1000 in the samples I put in my commit message.

Correct, I just took a profile of the whole d8 execution with Linux perf (via the helper explained in https://v8.dev/docs/linux-perf#profiling-d8-with-linux-perf-d8.py which takes care of injecting the samples from JIT-ed code). If startup or benchmark harness take a significant portion of the overall profile, I would say the workload is too short/light (as it probably was before).

danleh · 2024-12-20T03:09:18Z

wasm/TSF/build.sh

@@ -1,6 +1,9 @@
 #!/bin/sh
+
+set -euo pipefail
+


WDYT about adding some log file with a last build timestamp and Emscripten version for reproducibility? Something like

BUILD_LOG="$(realpath build.log)" echo -e "Built on $(date --rfc-3339=seconds)\n" | tee "$BUILD_LOG" echo "Toolchain versions" | tee -a "$BUILD_LOG" emcc --version | head -n1 | tee -a "$BUILD_LOG"

(shameless copy from sqlite3/build.sh)

Yeah, I was thinking we should just have one script to do it rather than duplicating the logic. That said, I can do that in a follow up #22. I'll copy your logic for now.

It seems like date --rfc-3339=seconds isn't supported by MacOS's out of the box date. I'm gonna go with date -u '+%Y-%m-%dT%H:%M:%SZ'

Ah, good to know and keep in mind for an eventual (shared?) build script.

danleh · 2024-12-20T03:13:07Z

JetStreamDriver.js

@@ -986,6 +1007,94 @@ class AsyncBenchmark extends DefaultBenchmark {
    }
 };

+// Meant for wasm benchmarks that are directly compiled with an emcc build script. It might not work for benchmarks built as
+// part of a larger project's build system or a wasm benchmark compiled from a language that doesn't compile with emcc.
+class WasmEMCCBenchmark extends AsyncBenchmark {


LGTM for now, long term it would be nice to get rid of some of the code duplication, e.g., the prerunCode below is mostly (but not fully) equal to the one in WasmLegacyBenchmark right?

Yeah, I think the goal is to eventually just delete WasmLegacyBenchmark once we have all the benchmarks we care about in the JS style system. If we want to save them for posterity we can reevaluate.

danleh · 2024-12-20T03:17:18Z

@camillobruni @danleh Do either of you have thoughts/comments/concerns? I would CC the Mozilla folks too but I don't have their GH handles...

Overall looks great to me, thanks! Left some small comments :)

In terms of Mozilla folks, I only know of (some of them mentioned their GitHub handles in the WebKit JetStream discussion Slack channel)

Ryan Hunt @eqrion
Ben Visness @bvisness
Iain Ireland @iainireland
Brian Grinstead @bgrins

Move existing benchmarks to WasmLegacyBenchamrk and add a new WasmBenchmark that supports running iterations. It's currently an async benchmark subclass since the compile is async. As a proof of concept this patch converts tsf-wasm to use the new scoring. As part of this process tsf needed to be recompiled so I added a step to the build.sh script which wraps emcc's output .js file with some glue code. This should make it easier to rebuild in the future too since we're not directly editing the module loading file. I went with this route rather than hooking into emcc's API since I wanted to use the js glue as close to "out of the box" as possible. Lastly, get emcc to ouput the .symbols object so we know which wasm function is which index. We could emit a name section into he binary but my assumption is that most production code does not ship with that so it could change the benchmark in an unrealistic way. TODO: * Profile perf differences * Generalize the wrapping process * Remove postIterationHook code for a different PR.

This seems to work pretty nicely and should be future proof.

…he generated code. Remove some stale logging. Remove main from TSF-wasm since it's no longer used. Make the count value an argument rather than a constant in C code. This appears to make the performance characteristics similar to what it was before this refactor. Old sampling (after recompile so wasm function numbers are meaningful): Sampling rate: 1000.000000 microseconds. Total samples: 364 Top functions as <numSamples 'functionName#hash:sourceID'> 199 '<?>.wasm-function[100]#<nil>:4294967295' 34 '<?>.wasm-function[238]#<nil>:4294967295' 19 '<?>.wasm-function[208]#<nil>:4294967295' 16 '_platform_memmove#<nil>:4294967295' 12 '<?>.wasm-function[251]#<nil>:4294967295' 11 '<?>.wasm-function[232]#<nil>:4294967295' 9 '<?>.wasm-function[209]#<nil>:4294967295' 8 '<?>.wasm-function[237]#<nil>:4294967295' 7 '<?>.wasm-function[201]#<nil>:4294967295' 5 '<?>.wasm-function[250]#<nil>:4294967295' 4 '<?>.wasm-function[213]#<nil>:4294967295' 3 '<?>.wasm-function[216]#<nil>:4294967295' Sampling rate: 1000.000000 microseconds. Total samples: 364 Tier breakdown: ----------------------------------- LLInt: 0 (0.000000%) Baseline: 1 (0.274725%) DFG: 0 (0.000000%) FTL: 0 (0.000000%) js builtin: 0 (0.000000%) IPInt: 0 (0.000000%) WasmLLInt: 3 (0.824176%) BBQ: 211 (57.967033%) OMG: 116 (31.868132%) Wasm: 0 (0.000000%) Host: 0 (0.000000%) RegExp: 0 (0.000000%) C/C++: 30 (8.241758%) Unknown Frame: 3 (0.824176%) Hottest bytecodes as <numSamples 'functionName#hash:JITType:bytecodeIndex'> 33 '<?>.wasm-function[100]:BBQMode:0x2ed' 33 '<?>.wasm-function[100]:BBQMode:0x2f0' 16 '_platform_memmove#<nil>:None:<nil>' 8 '<?>.wasm-function[100]:BBQMode:0xa733' 7 '<?>.wasm-function[100]:BBQMode:0x70a2' 6 '<?>.wasm-function[100]:BBQMode:0x6f8a' 6 '<?>.wasm-function[100]:BBQMode:0x6f94' 5 '<?>.wasm-function[208]:OMGMode:nil' 5 '<?>.wasm-function[100]:BBQMode:0x5b' 5 '<?>.wasm-function[100]:BBQMode:0x6f9b' 4 '<?>.wasm-function[100]:BBQMode:0x5df3' ... New sampling with this PR: Sampling rate: 1000.000000 microseconds. Total samples: 4908 Top functions as <numSamples 'functionName#hash:sourceID'> 2763 '<?>.wasm-function[100]#<nil>:4294967295' 464 '<?>.wasm-function[238]#<nil>:4294967295' 436 '<?>.wasm-function[208]#<nil>:4294967295' 138 '<?>.wasm-function[251]#<nil>:4294967295' 128 '<?>.wasm-function[237]#<nil>:4294967295' 117 '<?>.wasm-function[232]#<nil>:4294967295' 108 '<?>.wasm-function[250]#<nil>:4294967295' 107 '<?>.wasm-function[209]#<nil>:4294967295' 104 '<?>.wasm-function[201]#<nil>:4294967295' 77 '<?>.wasm-function[216]#<nil>:4294967295' 54 '<?>.wasm-function[226]#<nil>:4294967295' 43 '<?>.wasm-function[162]#<nil>:4294967295' Sampling rate: 1000.000000 microseconds. Total samples: 4908 Tier breakdown: ----------------------------------- LLInt: 0 (0.000000%) Baseline: 2 (0.040750%) DFG: 2 (0.040750%) FTL: 5 (0.101874%) js builtin: 1 (0.020375%) IPInt: 0 (0.000000%) WasmLLInt: 3 (0.061125%) BBQ: 2771 (56.458843%) OMG: 1988 (40.505297%) Wasm: 0 (0.000000%) Host: 0 (0.000000%) RegExp: 0 (0.000000%) C/C++: 95 (1.935615%) Unknown Frame: 41 (0.835371%) Unknown Executable: 1 (0.020375%) Hottest bytecodes as <numSamples 'functionName#hash:JITType:bytecodeIndex'> 526 '<?>.wasm-function[100]:BBQMode:0x2ed' 451 '<?>.wasm-function[100]:BBQMode:0x2f0' 125 '<?>.wasm-function[208]:OMGMode:0x0' 105 '<?>.wasm-function[100]:BBQMode:0x6f8a' 91 '<?>.wasm-function[100]:BBQMode:0xa724' 84 '<?>.wasm-function[100]:BBQMode:0x5b' 81 '<?>.wasm-function[100]:BBQMode:0xa733' 79 '<?>.wasm-function[100]:BBQMode:0x6f94' 77 '<?>.wasm-function[100]:BBQMode:nil' 63 '<?>.wasm-function[208]:OMGMode:0x16d' ... It looks like versions have similar sample rates for the top function. There's more samples in the top tier compiler but that's presumably ok and expected as the test runs longer.

Build products are now built to a build subdirectory. Log the build info into build.log Fix `date` command to work on MacOS out of the box. Reduce iteration count to 15/2. Remove ZLib check since it's always false in JetStream anyway.

kmiller68 · 2024-12-20T14:47:48Z

I'll wait until EoD to merge in case any Mozilla folks have comments .

camillobruni · 2024-12-23T08:48:35Z

JetStreamDriver.js

+                log: globalObject.print,
+                warn: (e) => { print("Warn: " + e); },
+                error: (e) => { print("Error: " + e); },
+                debug: (e) => { print("Debug: " + e); },


We do have a full console object available in our V8 shell (same goes for the browser).
I can create a follow-up CL to factor out a more defensive patching approach that reuses existing methods if possible?

kmiller68 marked this pull request as draft November 27, 2024 18:29

kmiller68 force-pushed the wasm-benchmarks-should-be-scored-like-js-benchmarks branch 2 times, most recently from 30ea5a3 to 66e93c8 Compare November 27, 2024 18:42

danleh mentioned this pull request Dec 5, 2024

Sqlite Wasm build: build script, README, shell polyfill #2

Merged

5 tasks

kmiller68 force-pushed the wasm-benchmarks-should-be-scored-like-js-benchmarks branch 3 times, most recently from adbb24d to d1534f0 Compare December 18, 2024 15:54

kmiller68 marked this pull request as ready for review December 18, 2024 16:00

danleh reviewed Dec 20, 2024

View reviewed changes

kmiller68 added 4 commits December 20, 2024 07:23

Use Modularize instead of wrapping emcc's output.

6ff39ea

This seems to work pretty nicely and should be future proof.

Address Danleh's comments:

e89743d

Build products are now built to a build subdirectory. Log the build info into build.log Fix `date` command to work on MacOS out of the box. Reduce iteration count to 15/2. Remove ZLib check since it's always false in JetStream anyway.

kmiller68 force-pushed the wasm-benchmarks-should-be-scored-like-js-benchmarks branch from d1534f0 to e89743d Compare December 20, 2024 14:45

kmiller68 merged commit 4e888f5 into main Dec 21, 2024

camillobruni reviewed Dec 23, 2024

View reviewed changes

kmiller68 deleted the wasm-benchmarks-should-be-scored-like-js-benchmarks branch February 7, 2025 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move wasm benchmarks to JS benchmark style scoring #18

Move wasm benchmarks to JS benchmark style scoring #18

kmiller68 commented Nov 27, 2024 •

edited

Loading

kmiller68 commented Dec 18, 2024

kmiller68 commented Dec 19, 2024

danleh commented Dec 19, 2024

danleh Dec 20, 2024

kmiller68 Dec 20, 2024

camillobruni Dec 23, 2024

danleh Dec 20, 2024

danleh Dec 20, 2024

danleh Dec 20, 2024 •

edited

Loading

danleh Dec 20, 2024

kmiller68 Dec 20, 2024

danleh Dec 20, 2024

danleh Dec 20, 2024 •

edited

Loading

danleh Dec 20, 2024

kmiller68 Dec 20, 2024

danleh Dec 20, 2024

danleh Dec 20, 2024

kmiller68 Dec 20, 2024

kmiller68 Dec 20, 2024

danleh Dec 20, 2024

danleh Dec 20, 2024

kmiller68 Dec 20, 2024

danleh commented Dec 20, 2024 •

edited

Loading

kmiller68 commented Dec 20, 2024

camillobruni Dec 23, 2024

Move wasm benchmarks to JS benchmark style scoring #18

Move wasm benchmarks to JS benchmark style scoring #18

Conversation

kmiller68 commented Nov 27, 2024 • edited Loading

kmiller68 commented Dec 18, 2024

kmiller68 commented Dec 19, 2024

danleh commented Dec 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danleh Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danleh Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danleh commented Dec 20, 2024 • edited Loading

kmiller68 commented Dec 20, 2024

Choose a reason for hiding this comment

kmiller68 commented Nov 27, 2024 •

edited

Loading

danleh Dec 20, 2024 •

edited

Loading

danleh Dec 20, 2024 •

edited

Loading

danleh commented Dec 20, 2024 •

edited

Loading