Skip to content

Latest commit

 

History

History
205 lines (156 loc) · 9.58 KB

benchmark.md

File metadata and controls

205 lines (156 loc) · 9.58 KB

Benchmark tools

edgeless_benchmark

edgeless_benchmark is a tool to help function developers and designers of orchestration algorithms through the automated performance evaluation of a population of workflows in controlled conditions.

The tool supports different arrival models and workflow types.

Arrival models (option --arrival-model):

Arrival model Description
poisson Inter-arrival between consecutive workflows and lifetimes are exponentially distributed.
incremental One new workflow arrive every new inter-arrival time, with constant lifetime.
incr-and-keep Add workflows, with constant lifetimes, incrementally until the warm up period finishes, then keep until the end of the experiment.
single Add a single workflow that lasts for the entire experiment.
trace Read the arrival and end times of workflows from a file specified with --workload-trace, one workflow per line in the format arrival,end_time

Workflow types (option --wf_type):

Workflow type Description Application metrics Template
none No workflow is created. This option is meant only for testing/troubleshooting. None N
single A single function. -- N
matrix-mul-chain A chain of functions, each performing the multiplication of two matrices of 32-bit floating point random numbers at each invocation. workflow,function Y
vector-mul-chain A chain of functions, each performing the multiplication of an internal random matrix of 32-bit floating point numbers by the input vector received from the caller. workflow,function Y
map-reduce A workflow consisting of a random number of stages, where each stage is composed of a random number of processing blocks. Before going to the next stage, the output from all the processing blocks in the stage before must be received. workflow Y
json-spec The workflow specified in the given JSON file. The string @WF_ID in the file is substituted with a sequential identifier of the workflow. -- N

For all the workflow types with Y in the template column a template can be generated by specifying --wf-type "NAME;template". For example, by running:

target/debug/edgeless_benchmark --wf-type "map-reduce;template" > map_reduce.json

A template will be generated in map_reduce.json, which can then be loaded with:

target/debug/edgeless_benchmark --wf-type "map-reduce;map_reduce.json"

The duration of the experiment is configurable via a command-line option, like the seed used to generate pseudo-random numbers to enable repeatable experiments and the duration of a warm-up period.

Metric collection

The collection of performance metrics is done on Redis, an instance of which must be reachable by edgeless_benchmark at the URL specified with the command-line argument --redis-url (e.g., --redis-url redis://127.0.0.1:6379).

If the command-line argument is not specified, then edgeless_benchmark will create the workload as specified without saving any performance metrics.

When used to collect performance metrics, one EDGELESS node with the metrics-collection resource provider is also needed and can be created by following the instructions in the step by step example below.

Dataset creation

The command edgeless_benchmark and the ε-ORC both support the option to save run-time events during the execution for the purpose of creating a dataset from an execution of the benchmark. It is also possible to specify additional_fields

For edgeleless_benchmark this option is enabled by specifying a non-empty value for option --dataset_path, which defines the path of where to save the dataset files. The dataset files are encoded in a comma-separated values (CSV) format, with the first row in each file containing the column names. Each entry is pre-prended with additional fields, which can be specified with the --additional_fields, corresponding to the additional header --additional_header. The output files are overwritten unless the --append option is provided.

For the ε-ORC, a Redis proxy must be enabled, and an additional optional section [proxy.dataset_settings] must be added, whose fields have the same meaning as the corresponding edgeless_benchmark command-line options above (see step-by-step example below).

The dataset files produced are the following:

Filename Format Produced by
health_status.csv timestamp,node_id,node_health_status ε-ORC
capabilities.csv timestamp,node_id,node_capabilities ε-ORC
mapping_to_instance_id.csv timestamp,logical_id,node_id1,physical_id1,... ε-ORC
performance_samples.csv metric,identifier,value,timestamp ε-ORC
application_metrics.csv entity,identifier,value,timestamp edgeless_benchmark

Notes:

  • The timestamp format is always A.B, where A is the Unix epoch in seconds and B the fractional part in nanoseconds.
  • All the identifiers (node_id, logical_id, and physical_id) are UUID.
  • The field entity in the application metrics can be f (function) or w (workflow).
  • Check the difference between application metrics and performance samples in the orchestration documentation.

Step by step example

We assume that the repository has been downloaded and compiled in debug mode (see building instructions) and that a local instance of Redis is running (see online instructions).

First, build the vector_mul.wasm bytecode:

target/debug/edgeless_cli function build functions/vector_mul/function.json

Then, create the configuration files:

target/debug/edgeless_inabox -t

Modify the [proxy] section of orchestrator.toml as follows:

[proxy]
proxy_type = "Redis"
redis_url = "redis://127.0.0.1:6379"

[proxy.dataset_settings]
dataset_path = "dataset/myexp-"
append = true
additional_fields = "a,b"
additional_header = "h_a,h_b"

And create the directory where the dataset files will be created:

mkdir dataset

In one shell start the EDGELESS in-a-box:

target/debug/edgeless_inabox

Modify the configuration of node.toml so that it includes the following:

[resources.metrics_collector_provider]
collector_type = "Redis"
redis_url = "redis://localhost:6379"
provider = "metrics-collector-1"

Then create the JSON file specifying the characteristics of the vector-mul-chain workflow:

cat << EOF > vector_mul_chain.json
{
  "min_chain_length": 5,
  "max_chain_length": 5,
  "min_input_size": 1000,
  "max_input_size": 2000,
  "function_wasm_path": "functions/vector_mul/vector_mul.wasm"
}
EOF

Create the directory that will host the dataset:

mkdir dataset

In another run the following benchmark, which lasts 30 seconds:

target/debug/edgeless_benchmark \
    --redis-url redis://127.0.0.1:6379 \
    -w "vector-mul-chain;vector_mul_chain.json" \
    --dataset-path "dataset/myexp-" \
    --additional-fields "a,b" \
    --additional-header "h_a,h_b" \
    --append

The dataset directory now contains all the files in the table below, starting with the prefix myexp-.

An example of a post-processing script is included:

% DATASET=dataset/myexp-application_metrics.csv python documentation/examples-app-metrics.py
the average latency of wf6 was 33.23 ms
the average latency of wf4 was 69.67 ms
the average latency of wf0 was 19.58 ms
the average latency of wf1 was 50.98 ms
the average latency of wf2 was 54.84 ms
the average latency of wf5 was 72.22 ms