Add initial version of benchmark experiment runner (#1266)

dannycjones · web-flow · commit bacb676bf730 · 2025-02-21T07:13:35.000Z
In order to investigate performance in Mountpoint, we want to be able to vary different parameters. In fact, it can be very useful to vary these parameters together to see how performance (such as sequential read throughput) changes as we vary two parameters together. This change introduces a new benchmark running script which uses the Python framework Hydra to enumerate combinations of parameters, and then execute some function with each combination. The script manages the lifecycle of the `mount-s3` file system and collecting data into an output folder. The change currently does not reuse the FIO definitions used by our regression benchmarks. In the mid-term, these should be reconciled. This pull request (PR) supersedes a previous PR: #986. ### Does this change impact existing behavior? No, this adds a new benchmark runner and benchmark definitions. This does not impact the Mountpoint file system. ### Does this change need a changelog entry? Does it require a version change? No, no impact to Mountpoint file system or crates. --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
diff --git a/benchmark/.gitignore b/benchmark/.gitignore
@@ -0,0 +1,2 @@
+/multirun/
+/outputs/
diff --git a/benchmark/.python-version b/benchmark/.python-version
@@ -0,0 +1 @@
+3.11
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,56 @@
+# Benchmark experiment runner
+
+This project allows to perform some Mountpoint benchmarks with different variables,
+such that a number of experiments can be run with ease and the logs
+and results be collected in a directory for each experiment run.
+
+The Python script `benchmark.py` handles the setup and teardown for each experiment.
+The experiment configuration space is managed using [Hydra](https://hydra.cc/).
+Configurations in `conf/` describe which values to configure to run experiments over a set of parameters
+such as the maximum count of Mountpoint FUSE workers,
+number of application workers reading from unique file handles, etc..
+
+The benchmark script currently supports FIO jobs.
+The list is defined in `conf/config.yaml` under the `fio_benchmarks` config entry.
+The FIO jobs define what workload they run,
+and also use environment variables in the job definition to allow this script to vary parameters.
+
+## Before you start
+
+You should have the environment setup where you want to run the benchmarking experiments.
+For instance, this might be an EC2 instance. You also need an S3 bucket to run the workload against.
+
+You should clone this repository to the environment. This tool will build Mountpoint for you.
+
+This project uses [uv](https://github.com/astral-sh/uv) to manage Python environments and dependencies.
+
+Think of `uv` as a close analog of Rust's _cargo_ but for Python.
+It will automatically configure a Python virtual environment for you and install the project dependencies.
+
+Assuming `uv` is installed, getting started is (almost) as easy as
+running the `benchmark.py` script from this directory!
+
+```sh
+uv run benchmark.py --
+```
+
+It should tell you that you forgot some arguments for the Python script itself.
+
+## Running the experiment
+
+There are a few variables that are required, such as the S3 bucket used for testing.
+You must set this in order to be able to use the benchmark script.
+
+Additionally, you should configure the AWS credentials for Mountpoint.
+You might use AWS profiles or set some credentials in the environment.
+
+To run the experiment, you can execute a command like this:
+
+```
+uv run benchmark.py -- s3_bucket=amzn-s3-demo-bucket
+```
+
+This will run the default experiment, including many different configuration combinations.
+Output is written to `multirun/` within directories for the date, time, and experiment number run.
+The output directory includes a few different files from an individual experiment run,
+including the individual benchmark output `benchmark.log`, FIO output, and Mountpoint logs.
diff --git a/benchmark/benchmark.py b/benchmark/benchmark.py
@@ -0,0 +1,242 @@
+from contextlib import contextmanager
+from datetime import datetime, timezone
+import json
+import logging
+import os
+from os import path
+import subprocess
+from subprocess import Popen
+import tempfile
+
+import hydra
+from omegaconf import DictConfig
+
+logging.basicConfig(
+    level=os.environ.get('LOGLEVEL', 'INFO').upper()
+)
+
+log = logging.getLogger(__name__)
+
+MOUNT_DIRECTORY = "s3"
+MP_LOGS_DIRECTORY = "mp_logs/"
+
+
+@contextmanager
+def _mounted_bucket(
+        cfg: DictConfig,
+        ):
+    """
+    Mounts the S3 bucket, providing metadata about the successful mount.
+
+    Context manager allows use of `with` clause, automatically unmounting the bucket.
+    """
+    mount_dir = tempfile.mkdtemp(suffix=".mountpoint-s3")
+    mount_metadata = _mount_mp(cfg, mount_dir)
+    try:
+        yield mount_metadata
+    finally:
+        try:
+            subprocess.check_output(["umount", mount_dir])
+            log.debug(f"{mount_dir} unmounted")
+            os.rmdir(mount_dir)
+        except Exception:
+            log.error(f"Error cleaning up Mountpoint at {mount_dir}:",  exc_info=True)
+
+
+class MountError(Exception):
+    pass
+
+
+def _mount_mp(
+        cfg: DictConfig,
+        mount_dir: str,
+        ) -> dict[str, any] | MountError | subprocess.CalledProcessError:
+    """
+    Mount an S3 bucket using Mountpoint,
+    using the configuration to apply Mountpoint arguments.
+
+    Returns Mountpoint version string.
+    """
+
+    if cfg['mountpoint_binary'] is None:
+        mountpoint_args = [
+            "cargo",
+            "run",
+            "--quiet",
+            "--release",
+            "--",
+        ]
+    else:
+        mountpoint_args = [cfg['mountpoint_binary']]
+
+    os.makedirs(MP_LOGS_DIRECTORY, exist_ok=True)
+
+    bucket = cfg['s3_bucket']
+
+    mountpoint_version_output = subprocess \
+        .check_output([
+            *mountpoint_args,
+            "--version"
+            ]) \
+        .decode("utf-8")
+    log.info("Mountpoint version: %s", mountpoint_version_output.strip())
+
+    subprocess_args = [
+        *mountpoint_args,
+        bucket,
+        mount_dir,
+        "--log-metrics",
+        "--allow-overwrite",
+        "--allow-delete",
+        f"--log-directory={MP_LOGS_DIRECTORY}",
+    ]
+    subprocess_env = {
+        "PATH": os.environ["PATH"],
+    }
+
+    if cfg['s3_prefix'] is not None:
+        subprocess_args.append(f"--prefix={cfg['s3_prefix']}")
+
+    if cfg['mountpoint_debug']:
+        subprocess_args.append("--debug")
+    if cfg['mountpoint_debug_crt']:
+        subprocess_args.append("--debug-crt")
+
+    if cfg["read_part_size"]:
+        subprocess_args.append(f"--read-part-size={cfg['read_part_size']}")
+    if cfg["write_part_size"]:
+        subprocess_args.append(f"--write-part-size={cfg['write_part_size']}")
+
+    if cfg['metadata_ttl'] is not None:
+        subprocess_args.append(f"--metadata-ttl={cfg['metadata_ttl']}")
+
+    if cfg['upload_checksums'] is not None:
+        subprocess_args.append(f"--upload-checksums={cfg['upload_checksums']}")
+
+    if cfg['fuse_threads'] is not None:
+        subprocess_args.append(f"--max-threads={cfg['fuse_threads']}")
+
+    log.info(f"Mounting S3 bucket {bucket} with args: %s; env: %s", subprocess_args, subprocess_env)
+    try:
+        output = subprocess.check_output(subprocess_args, env=subprocess_env)
+    except subprocess.CalledProcessError as e:
+        log.error(f"Error during mounting: {e}")
+        raise MountError() from e
+
+    log.info("Mountpoint output: %s", output.decode("utf-8").strip())
+
+    return {
+        "mount_dir": mount_dir,
+        "mount_s3_command": " ".join(subprocess_args),
+        "mount_s3_env": subprocess_env,
+        "mp_version": mountpoint_version_output.strip(),
+    }
+
+
+def _run_fio(cfg: DictConfig, mount_dir: str) -> None:
+    """
+    Run the FIO workload against the file system.
+    """
+    FIO_BINARY = "fio"
+    fio_job_name = cfg["fio_benchmark"]
+    fio_output_filepath = f"fio.{fio_job_name}.json"
+
+    # TODO: Avoid duplicating/diverging the FIO jobs between `benchmark/fio/` and `mountpoint-s3/scripts/fio/`
+    fio_job_filepath = hydra.utils.to_absolute_path(f"fio/{fio_job_name}.fio")
+    subprocess_args = [
+        FIO_BINARY,
+        "--eta=never",
+        "--output-format=json",
+        f"--output={fio_output_filepath}",
+        f"--directory={mount_dir}",
+        fio_job_filepath,
+    ]
+    subprocess_env = {
+        "PATH": os.environ["PATH"],
+        "APP_WORKERS": str(cfg['application_workers']),
+        "SIZE_GIB": "100",
+        "DIRECT": "1" if cfg['direct_io'] else "0",
+        "UNIQUE_DIR": datetime.now(tz=timezone.utc).isoformat(),
+        # TODO: Confirm assumption that `libaio` should make direct IO go faster.
+        # TODO: Review if we should use sync or psync. We use `sync` in other benchmarks.
+        "IO_ENGINE": "libaio" if cfg['direct_io'] else "psync",
+    }
+    log.info("Running FIO with args: %s; env: %s", subprocess_args, subprocess_env)
+
+    # Use Popen instead of check_output, as we had some issues when trying to attach perf
+    with Popen(subprocess_args, env=subprocess_env) as process:
+        exit_code = process.wait()
+        if exit_code != 0:
+            log.error(f"FIO process failed with exit code {exit_code}")
+            raise subprocess.CalledProcessError(exit_code, subprocess_args)
+        else:
+            log.info("FIO process completed successfully")
+
+
+def _collect_logs() -> None:
+    """
+    Collect the Mountpoint log if it exists and move to the output directory.
+    Mountpoint log filename will be normalized removing the date, etc..
+    The old log directory is removed.
+
+    Fails if more than one log file is found.
+    """
+    logs_directory = path.join(os.getcwd(), MP_LOGS_DIRECTORY)
+    dir_entries = os.listdir(logs_directory)
+
+    if not dir_entries:
+        log.debug(f"No Mountpoint log files in directory {logs_directory}")
+        return
+
+    assert len(dir_entries) <= 1, f"Expected no more than one log file in {logs_directory}"
+
+    old_log_dir = path.join(logs_directory, dir_entries[0])
+    new_log_path = "mountpoint-s3.log"
+    log.debug(f"Renaming {old_log_dir} to {new_log_path}")
+    os.rename(old_log_dir, new_log_path)
+    os.rmdir(logs_directory)
+
+
+def _write_metadata(metadata: dict[str, any]) -> None:
+    with open("metadata.json", "w") as f:
+        json.dump(metadata, f, default=str)
+
+
+def _postprocessing(metadata: dict[str, any]) -> None:
+    _collect_logs()
+    _write_metadata(metadata)
+
+
+@hydra.main(version_base=None, config_path="conf", config_name="config")
+def run_experiment(cfg: DictConfig) -> None:
+    """
+    At a high level, we want to mount the S3 bucket using Mountpoint,
+    run a synthetic workload against Mountpoint while capturing metrics and logs,
+    then end the load and unmount the bucket.
+
+    We should collect all of the logs and metric and dump them in the output directory.
+    """
+    log.debug("Experiment starting")
+    metadata = {
+        "start_time": datetime.now(tz=timezone.utc),
+        "success": False,
+    }
+
+    with _mounted_bucket(cfg) as mount_metadata:
+        metadata.update(mount_metadata)
+        mount_dir = mount_metadata["mount_dir"]
+        try:
+            # TODO: Add resource monitoring during FIO job
+            _run_fio(cfg, mount_dir)
+            metadata["success"] = True
+        except Exception as e:
+            log.error(f"Error running experiment: {e}")
+
+    metadata["end_time"] = datetime.now(tz=timezone.utc)
+
+    _postprocessing(metadata)
+    log.info("Experiment ended")
+
+
+if __name__ == "__main__":
+    run_experiment()
diff --git a/benchmark/conf/config.yaml b/benchmark/conf/config.yaml
@@ -0,0 +1,43 @@
+# This file (and others in `conf/`) specify some static parameters,
+# as well as some which will be 'swept' over for experiments.
+
+defaults:
+  - _self_
+
+s3_bucket: ???
+s3_prefix: !!null
+
+read_part_size: !!null
+write_part_size: 16777216 # to allow for uploads of 100GiB
+
+metadata_ttl: "indefinite"
+
+fio_benchmarks: sequential_read  #, random_read, sequential_write
+
+# Path to Mountpoint binary. Recommended to use an absolute path.
+mountpoint_binary: !!null
+mountpoint_debug: false
+mountpoint_debug_crt: false
+
+# For overriding upload checksums configured for Mountpoint. Passed as `--upload-checksums` argument.
+upload_checksums: !!null
+
+iterations: 1
+
+hydra:
+  help:
+    app_name: "Mountpoint sequential read experiment runner"
+  mode: MULTIRUN
+  job:
+    chdir: true
+  sweeper:
+    params:
+      # Maximum number of FUSE threads for Mountpoint. Passed as `--max-threads` argument.
+      '+fuse_threads': 16, 32, 64, 128
+      # Number of processes that will be interacting with the file.
+      '+application_workers': 1, 4, 8, 16, 32, 64, 128, 256
+      # Configure if application should use Direct IO, skipping the Linux page cache
+      '+direct_io': false, true
+      # Don't touch the params below, they are based on settings above.
+      '+fio_benchmark': "${fio_benchmarks}"
+      '+iteration': "range(${iterations})"
diff --git a/benchmark/fio/global_incl.fio b/benchmark/fio/global_incl.fio
@@ -0,0 +1,13 @@
+bs=256k ; no. of bytes
+runtime=30s
+time_based
+group_reporting
+; Use multiple threads instead of processes to parallelize IO
+thread
+size=${SIZE_GIB}Gi
+; Do not make this non-zero, otherwise prefetcher can buffer before ramp ends
+ramp_time=0s
+ioengine=${IO_ENGINE}
+numjobs=${APP_WORKERS}
+openfiles=${APP_WORKERS}
+direct=${DIRECT}
diff --git a/benchmark/fio/sequential_read.fio b/benchmark/fio/sequential_read.fio
@@ -0,0 +1,8 @@
+[global]
+include global_incl.fio
+
+[sequential_read]
+filename_format=j$jobnum_${SIZE_GIB}GiB.bin
+size=${SIZE_GIB}Gi
+rw=read
+fallocate=none
diff --git a/benchmark/fio/sequential_write.fio b/benchmark/fio/sequential_write.fio
@@ -0,0 +1,11 @@
+[global]
+include global_incl.fio
+
+[sequential_write]
+; We use a unique directory to ensure its a new file. There's no way to specify O_TRUNCATE.
+filename_format=${UNIQUE_DIR}/$jobname/$jobnum.bin
+rw=write
+fallocate=none
+create_on_open=1
+fsync_on_close=1
+unlink=1
diff --git a/benchmark/pyproject.toml b/benchmark/pyproject.toml
@@ -0,0 +1,9 @@
+[project]
+name = "benchmark"
+version = "0.1.0"
+description = "Benchmark runner for measuring Mountpoint performance while varying configuration"
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    "hydra-core>=1.3.2",
+]
diff --git a/benchmark/uv.lock b/benchmark/uv.lock