Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move wasm benchmarks to JS benchmark style scoring #18

Merged
merged 4 commits into from
Dec 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 142 additions & 24 deletions JetStreamDriver.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"use strict";

/*
* Copyright (C) 2018-2022 Apple Inc. All rights reserved.
* Copyright (C) 2018-2024 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -36,11 +36,17 @@ globalThis.dumpJSONResults ??= false;
globalThis.customTestList ??= [];

let shouldReport = false;
let startDelay;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was simply pulled out into a variable and made configurable, right? And this is essentially independent of the Wasm change, correct? Just out of interest: Do you know what the original reasoning behind 4000ms delay, something like empirically observed loading time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should be independent. The delay is because our internal A/B testing harness restarts the browser between full benchmark runs to reduce variance or caching. So we want to wait a few seconds for the browser to to calm down before starting the test, at least, IIRC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in on more configurable test params!

I'll probably copy the params.mjs from speedometer as well and then we have a more central place to store all the test parameters.

if (typeof(URLSearchParams) !== "undefined") {
const urlParameters = new URLSearchParams(window.location.search);
shouldReport = urlParameters.has('report') && urlParameters.get('report').toLowerCase() == 'true';
if (shouldReport)
startDelay = 4000;
if (urlParameters.has('startDelay'))
startDelay = urlParameters.get('startDelay');
if (urlParameters.has('test'))
customTestList = urlParameters.getAll("test");

}

// Used for the promise representing the current benchmark run.
Expand Down Expand Up @@ -158,7 +164,7 @@ function uiFriendlyDuration(time)
const minutes = time.getMinutes();
const seconds = time.getSeconds();
const milliSeconds = time.getMilliseconds();
const result = "" + minutes + ":";
let result = "" + minutes + ":";

result = result + (seconds < 10 ? "0" : "") + seconds + ".";
result = result + (milliSeconds < 10 ? "00" : (milliSeconds < 100 ? "0" : "")) + milliSeconds;
Expand Down Expand Up @@ -330,7 +336,13 @@ class Driver {
} else
globalObject = runString("");

globalObject.console = { log: globalObject.print, warn: (e) => { print("Warn: " + e); /*$vm.abort();*/ } }
globalObject.console = {
log: globalObject.print,
warn: (e) => { print("Warn: " + e); },
error: (e) => { print("Error: " + e); },
debug: (e) => { print("Debug: " + e); },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a full console object available in our V8 shell (same goes for the browser).
I can create a follow-up CL to factor out a more defensive patching approach that reuses existing methods if possible?

};

globalObject.self = globalObject;
globalObject.top = {
currentResolve,
Expand Down Expand Up @@ -414,8 +426,8 @@ class Driver {
await this.prefetchResourcesForBrowser();
await this.fetchResources();
this.prepareToRun();
if (isInBrowser && shouldReport) {
setTimeout(() => this.start(), 4000);
if (isInBrowser && startDelay !== undefined) {
setTimeout(() => this.start(), startDelay);
}
}

Expand Down Expand Up @@ -543,10 +555,11 @@ class Benchmark {
if (__benchmark.prepareForNextIteration)
__benchmark.prepareForNextIteration();

${this.preiterationCode}
${this.preIterationCode}
let start = performance.now();
__benchmark.runIteration();
let end = performance.now();
${this.postIterationCode}

results.push(Math.max(1, end - start));
}
Expand All @@ -565,11 +578,24 @@ class Benchmark {

get prerunCode() { return null; }

get preiterationCode() {
get preIterationCode() {
let code = "";
if (this.plan.deterministicRandom)
return `Math.random.__resetSeed();`;
code += `Math.random.__resetSeed();`;

return "";
if (globalThis.customPreIterationCode)
code += customPreIterationCode;

return code;
}

get postIterationCode() {
let code = "";

if (globalThis.customPostIterationCode)
code += customPostIterationCode;

return code;
}

async run() {
Expand Down Expand Up @@ -972,10 +998,11 @@ class AsyncBenchmark extends DefaultBenchmark {
let __benchmark = new Benchmark();
let results = [];
for (let i = 0; i < ${this.iterations}; i++) {
${this.preiterationCode}
${this.preIterationCode}
let start = performance.now();
await __benchmark.runIteration();
let end = performance.now();
${this.postIterationCode}
results.push(Math.max(1, end - start));
}
if (__benchmark.validate)
Expand All @@ -986,6 +1013,94 @@ class AsyncBenchmark extends DefaultBenchmark {
}
};

// Meant for wasm benchmarks that are directly compiled with an emcc build script. It might not work for benchmarks built as
// part of a larger project's build system or a wasm benchmark compiled from a language that doesn't compile with emcc.
class WasmEMCCBenchmark extends AsyncBenchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now, long term it would be nice to get rid of some of the code duplication, e.g., the prerunCode below is mostly (but not fully) equal to the one in WasmLegacyBenchmark right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the goal is to eventually just delete WasmLegacyBenchmark once we have all the benchmarks we care about in the JS style system. If we want to save them for posterity we can reevaluate.

get prerunCode() {
let str = `
let verbose = false;

let globalObject = this;

abort = quit = function() {
if (verbose)
console.log('Intercepted quit/abort');
};

oldPrint = globalObject.print;
globalObject.print = globalObject.printErr = (...args) => {
if (verbose)
console.log('Intercepted print: ', ...args);
};

let Module = {
preRun: [],
postRun: [],
print: print,
printErr: printErr,
setStatus: function(text) {
},
totalDependencies: 0,
monitorRunDependencies: function(left) {
this.totalDependencies = Math.max(this.totalDependencies, left);
Module.setStatus(left ? 'Preparing... (' + (this.totalDependencies-left) + '/' + this.totalDependencies + ')' : 'All downloads complete.');
},
};
globalObject.Module = Module;
`;
return str;
}

get runnerCode() {
let str = `function loadBlob(key, path, andThen) {`;

if (isInBrowser) {
str += `
var xhr = new XMLHttpRequest();
xhr.open('GET', path, true);
xhr.responseType = 'arraybuffer';
xhr.onload = function() {
Module[key] = new Int8Array(xhr.response);
andThen();
};
xhr.send(null);
`;
} else {
str += `
Module[key] = new Int8Array(read(path, "binary"));

Module.setStatus = null;
Module.monitorRunDependencies = null;

Promise.resolve(42).then(() => {
try {
andThen();
} catch(e) {
console.log("error running wasm:", e);
console.log(e.stack);
throw e;
}
})
`;
}

str += "}";

let keys = Object.keys(this.plan.preload);
for (let i = 0; i < keys.length; ++i) {
str += `loadBlob("${keys[i]}", "${this.plan.preload[keys[i]]}", async () => {\n`;
}

str += super.runnerCode;
for (let i = 0; i < keys.length; ++i) {
str += `})`;
}
str += `;`;

return str;
}
};

class WSLBenchmark extends Benchmark {
constructor(...args) {
super(...args);
Expand Down Expand Up @@ -1062,7 +1177,7 @@ class WSLBenchmark extends Benchmark {
}
};

class WasmBenchmark extends Benchmark {
class WasmLegacyBenchmark extends Benchmark {
constructor(...args) {
super(...args);

Expand Down Expand Up @@ -1776,18 +1891,21 @@ const testPlans = [
preload: {
wasmBinary: "./wasm/HashSet.wasm"
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
},
{
name: "tsf-wasm",
files: [
"./wasm/tsf.js"
"./wasm/TSF/build/tsf.js",
"./wasm/TSF/benchmark.js",
],
preload: {
wasmBinary: "./wasm/tsf.wasm"
wasmBinary: "./wasm/TSF/build/tsf.wasm"
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmEMCCBenchmark,
iterations: 15,
worstCaseCount: 2,
testGroup: WasmGroup
},
{
Expand All @@ -1798,7 +1916,7 @@ const testPlans = [
preload: {
wasmBinary: "./wasm/quicksort.wasm"
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
},
{
Expand All @@ -1809,7 +1927,7 @@ const testPlans = [
preload: {
wasmBinary: "./wasm/gcc-loops.wasm"
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
},
{
Expand All @@ -1820,7 +1938,7 @@ const testPlans = [
preload: {
wasmBinary: "./wasm/richards.wasm"
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
},
{
Expand All @@ -1833,7 +1951,7 @@ const testPlans = [
preload: {
wasmBinary: "./sqlite3/build/jswasm/speedtest1.wasm"
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
},
{
Expand All @@ -1852,7 +1970,7 @@ const testPlans = [
preload: {
tfjsBackendWasmBlob: "./wasm/tfjs-backend-wasm.wasm",
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
async: true,
deterministicRandom: true,
testGroup: WasmGroup
Expand All @@ -1873,7 +1991,7 @@ const testPlans = [
preload: {
tfjsBackendWasmSimdBlob: "./wasm/tfjs-backend-wasm-simd.wasm",
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
async: true,
deterministicRandom: true,
testGroup: WasmGroup
Expand All @@ -1888,7 +2006,7 @@ const testPlans = [
preload: {
argon2WasmBlob: "./wasm/argon2.wasm",
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
},
{
Expand All @@ -1901,7 +2019,7 @@ const testPlans = [
preload: {
argon2WasmSimdBlob: "./wasm/argon2-simd.wasm",
},
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
},
// WorkerTests
Expand Down Expand Up @@ -1975,7 +2093,7 @@ const testPlans = [
romBinary: "./8bitbench/assets/program.bin"
},
async: true,
benchmarkClass: WasmBenchmark,
benchmarkClass: WasmLegacyBenchmark,
testGroup: WasmGroup
}
];
Expand Down
3 changes: 2 additions & 1 deletion sqlite3/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ rm sqlite-src-*.zip
rm -rf sqlite-src-*/
rm -rf build/

touch build.log
BUILD_LOG="$(realpath build.log)"
echo -e "Built on $(date --rfc-3339=seconds)\n" | tee "$BUILD_LOG"
echo -e "Built on $(date -u '+%Y-%m-%dT%H:%M:%SZ')\n" | tee "$BUILD_LOG"

echo "Toolchain versions" | tee -a "$BUILD_LOG"
emcc --version | head -n1 | tee -a "$BUILD_LOG"
Expand Down
33 changes: 33 additions & 0 deletions wasm/TSF/benchmark.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
* Copyright (C) 2024 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/

class Benchmark {
async runIteration() {
if (!Module._runIteration)
await setupModule(Module);

Module._runIteration(150);
Copy link
Contributor

@danleh danleh Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to clarify the workload size before vs. now: It used to be the default of count = 1000 because main was invoked without arguments, and now we are at defaultIterationCount = 120 times 150 per runIteration, right?

It seems the line item is running quite a bit longer now cf. before (~12s vs. ~1.7s, or roughly 130B dynamic instructions vs. 15B on an x64 workstation). I see more functions using the top-tier now, which is fine by me. In the old version, quite some time was spent in JS read/write functions and not in Wasm at all, which was certainly not ideal.

We just might have to lower the workload size, in case the overall benchmark becomes too long to run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excerpt of a bottom-up profile in d8 before:
image
vs. now:
image
I'm happy to land this, we should just be aware that this is now moving more into the direction of measuring peak performance. (Was this a conscious decision?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should probably reduce the iteration count. I did that in my followup patches for other benchmarks, I'll do it here too. That said, it seems like your samples include the harness code? That's probably not ideal.

The function numbers also changed for me when I recompiled the test, I ended up just simulating the old benchmark by running one iteration with count = 1000 in the samples I put in my commit message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I just took a profile of the whole d8 execution with Linux perf (via the helper explained in https://v8.dev/docs/linux-perf#profiling-d8-with-linux-perf-d8.py which takes care of injecting the samples from JIT-ed code). If startup or benchmark harness take a significant portion of the overall profile, I would say the workload is too short/light (as it probably was before).

}
};
6 changes: 6 additions & 0 deletions wasm/TSF/build.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Built on 2024-12-20T14:36:37Z

Toolchain versions
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.73-git
Building...
Building done
Loading