Add saving of logs to disk & combining into CSV #13

michaeltremeer · 2023-12-21T02:55:34Z

The code adds the following:

Adds a json-save-dir arg to the load parser, so that JSON logs can be saved to disk
- A filehandler is added to the base logger, and the output folders & file created automatically.
- Pipes the logs out, including most of the run args (so that each log file contains all relevant configuration when being compared later).
Adds a combine_logs subcommand to the parser
- Extracts run information and final metrics (prior to termination/max requests/duration) from raw log files, and combines them into a single output CSV file, ready for analysis.

technicianted

Thanks for your contribution!

technicianted · 2023-12-21T04:29:28Z

benchmark/statsaggregator.py

         else:
-            print(f"{timestamp} rpm: {rpm:<5} requests: {self.requests_count:<5} failures: {self.failed_count:<4} throttled: {self.throttled_count:<4} tpm: {tokens_per_minute:<6} ttft_avg: {ttft_avg:<6} ttft_95th: {ttft_95th:<6} tbt_avg: {tbt_avg:<6} tbt_95th: {tbt_95th:<6} e2e_avg: {e2e_latency_avg:<6} e2e_95th: {e2e_latency_95th:<6} util_avg: {util_avg:<6} util_95th: {util_95th:<6}", flush=True)
+            logger.info(f"rpm: {rpm:<5} requests: {self.requests_count:<5} failures: {self.failed_count:<4} throttled: {self.throttled_count:<4} tpm: {tokens_per_minute:<6} ttft_avg: {ttft_avg:<6} ttft_95th: {ttft_95th:<6} tbt_avg: {tbt_avg:<6} tbt_95th: {tbt_95th:<6} e2e_avg: {e2e_latency_avg:<6} e2e_95th: {e2e_latency_95th:<6} util_avg: {util_avg:<6} util_95th: {util_95th:<6}")


The idea behind using print and not logger is to redirect to stdout vs stderr such that you can use shell redirection to only get the stats or jsonl. You will need to add conditionals here to make sure that this behvior doesn't break.

technicianted · 2023-12-21T04:30:46Z

benchmark/jsonloganalysis.py

@@ -0,0 +1,97 @@
+import argparse


I think this feature is too specific for this tool. In addition, but outputting stats in jsonl, you can use simple command line tools such as jq to aggregate.

Perhaps you should take this feature out in a separate PR for better discussion.

technicianted · 2023-12-21T04:34:18Z

benchmark/bench.py

    args = parser.parse_args()
+
+    if args.func is load and args.log_save_dir is not None:


I'm wondering how is this different from using shell redirection of stderr? I could do something like this:

$ python -m benchmark.bench load ... 2> my/output/dir/output.log

because this is only stderr, I will only get the logger in output.log

technicianted · 2023-12-21T04:35:24Z

README.md

@@ -33,7 +33,7 @@ $ docker run azure-openai-benchmarking load --help
 Consider the following guidelines when creating your benchmark tests

 1.  **Ensure call characteristics match your production expectations**. The number of calls per minute and total tokens you are able to process varies depending on the prompt size, generation size and call rate.
-1. **Run your test long enough to reach a stable state**. Throttling is based on the total compute you have deployed and are utilizing. The utilization includes active calls. As a result you will see a higher call rate when ramping up on an unloaded deployment because there are no existing active calls being processed. Once your deplyoment is fully loaded with a utilzation near 100%, throttling will increase as calls can only be processed as earlier ones are completed. To ensure an accurate measure, set the duration long enough for the throughput to stabilize, especialy when running at or close to 100% utilization.
+1. **Run your test long enough to reach a stable state**. Throttling is based on the total compute you have deployed and are utilizing. The utilization includes active calls. As a result you will see a higher call rate when ramping up on an unloaded deployment because there are no existing active calls being processed. Once your deplyoment is fully loaded with a utilzation near 100%, throttling will increase as calls can only be processed as earlier ones are completed. To ensure an accurate measure, set the duration long enough for the throughput to stabilize, especialy when running at or close to 100% utilization. Also note that once the test ends (either by termination, or reaching the maximum duration or number of requests), any pending requests will continue to drain, which can result in lower throughput values as the load on the endpoint gradually decreases to 0.


Great clarification!

michaeltremeer · 2023-12-21T09:19:36Z

Good points in the comments, and the combine_logs functionality is a stretch for the core functionality of the repo - agreed. The only note I would say though (inspired by my own use): With shell redirection, you do miss any warnings or errors that occurred during the run, as well as any run parameters which make it easier to tell what the config was for each run (useful when you're analysing a number of permutations of shape, model and concurrency). You also miss the time the run finished or was terminated (which means the post-test period where requests are still draining can impact the numbers).

If you think it's valid and worth a change, I can resubmit just the first set of changes for saving a single run's logs, otherwise I can just submit the updated README for clarity.

technicianted · 2023-12-21T22:48:48Z

you do miss any warnings or errors that occurred during the run
You can redirect both stdout and stderr to the same file if needed:

$ python -m benchmark.bench load ... > my/output/dir/full_output.log 2>&1

or separate files:

$ python -m benchmark.bench load ... > my/output/dir/stats_output.log 2> my/output/dir/logs.log

Does that cover the use-case?

as well as any run parameters which make it easier to tell what the config was for each run

Yes I agree this is super useful. Let's add that in a separate PR.

I can resubmit just the first set of changes for saving a single run's logs, otherwise I can just submit the updated README for clarity.

Yes let's break them down. Generally many smaller PRs is always better :)

Thanks!

michaeltremeer added 8 commits December 15, 2023 16:19

Add log-save-dir param to CLI

f04f68a

Fix use of rate arg in json log filename

e3c0228

Fix log_save_dir bugs

5e7068d

Add .log to filename

9d05817

Add combine_logs CLI option

c51aafb

Update README

b2421fb

Tweak logging of run arguments

759aefa

Add pandas to requirements.txt

ec7fe87

technicianted suggested changes Dec 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add saving of logs to disk & combining into CSV #13

Add saving of logs to disk & combining into CSV #13

michaeltremeer commented Dec 21, 2023

technicianted left a comment

technicianted Dec 21, 2023

technicianted Dec 21, 2023

technicianted Dec 21, 2023

technicianted Dec 21, 2023

michaeltremeer commented Dec 21, 2023 •

edited

Loading

technicianted commented Dec 21, 2023 •

edited

Loading

		args = parser.parse_args()

		if args.func is load and args.log_save_dir is not None:

Add saving of logs to disk & combining into CSV #13

Are you sure you want to change the base?

Add saving of logs to disk & combining into CSV #13

Conversation

michaeltremeer commented Dec 21, 2023

technicianted left a comment

Choose a reason for hiding this comment

technicianted Dec 21, 2023

Choose a reason for hiding this comment

technicianted Dec 21, 2023

Choose a reason for hiding this comment

technicianted Dec 21, 2023

Choose a reason for hiding this comment

technicianted Dec 21, 2023

Choose a reason for hiding this comment

michaeltremeer commented Dec 21, 2023 • edited Loading

technicianted commented Dec 21, 2023 • edited Loading

michaeltremeer commented Dec 21, 2023 •

edited

Loading

technicianted commented Dec 21, 2023 •

edited

Loading