-
Notifications
You must be signed in to change notification settings - Fork 60
Add saving of logs to disk & combining into CSV #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
else: | ||
print(f"{timestamp} rpm: {rpm:<5} requests: {self.requests_count:<5} failures: {self.failed_count:<4} throttled: {self.throttled_count:<4} tpm: {tokens_per_minute:<6} ttft_avg: {ttft_avg:<6} ttft_95th: {ttft_95th:<6} tbt_avg: {tbt_avg:<6} tbt_95th: {tbt_95th:<6} e2e_avg: {e2e_latency_avg:<6} e2e_95th: {e2e_latency_95th:<6} util_avg: {util_avg:<6} util_95th: {util_95th:<6}", flush=True) | ||
logger.info(f"rpm: {rpm:<5} requests: {self.requests_count:<5} failures: {self.failed_count:<4} throttled: {self.throttled_count:<4} tpm: {tokens_per_minute:<6} ttft_avg: {ttft_avg:<6} ttft_95th: {ttft_95th:<6} tbt_avg: {tbt_avg:<6} tbt_95th: {tbt_95th:<6} e2e_avg: {e2e_latency_avg:<6} e2e_95th: {e2e_latency_95th:<6} util_avg: {util_avg:<6} util_95th: {util_95th:<6}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind using print
and not logger
is to redirect to stdout vs stderr such that you can use shell redirection to only get the stats or jsonl. You will need to add conditionals here to make sure that this behvior doesn't break.
@@ -0,0 +1,97 @@ | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this feature is too specific for this tool. In addition, but outputting stats in jsonl, you can use simple command line tools such as jq
to aggregate.
Perhaps you should take this feature out in a separate PR for better discussion.
args = parser.parse_args() | ||
|
||
if args.func is load and args.log_save_dir is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how is this different from using shell redirection of stderr? I could do something like this:
$ python -m benchmark.bench load ... 2> my/output/dir/output.log
because this is only stderr, I will only get the logger
in output.log
@@ -33,7 +33,7 @@ $ docker run azure-openai-benchmarking load --help | |||
Consider the following guidelines when creating your benchmark tests | |||
|
|||
1. **Ensure call characteristics match your production expectations**. The number of calls per minute and total tokens you are able to process varies depending on the prompt size, generation size and call rate. | |||
1. **Run your test long enough to reach a stable state**. Throttling is based on the total compute you have deployed and are utilizing. The utilization includes active calls. As a result you will see a higher call rate when ramping up on an unloaded deployment because there are no existing active calls being processed. Once your deplyoment is fully loaded with a utilzation near 100%, throttling will increase as calls can only be processed as earlier ones are completed. To ensure an accurate measure, set the duration long enough for the throughput to stabilize, especialy when running at or close to 100% utilization. | |||
1. **Run your test long enough to reach a stable state**. Throttling is based on the total compute you have deployed and are utilizing. The utilization includes active calls. As a result you will see a higher call rate when ramping up on an unloaded deployment because there are no existing active calls being processed. Once your deplyoment is fully loaded with a utilzation near 100%, throttling will increase as calls can only be processed as earlier ones are completed. To ensure an accurate measure, set the duration long enough for the throughput to stabilize, especialy when running at or close to 100% utilization. Also note that once the test ends (either by termination, or reaching the maximum duration or number of requests), any pending requests will continue to drain, which can result in lower throughput values as the load on the endpoint gradually decreases to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great clarification!
Good points in the comments, and the If you think it's valid and worth a change, I can resubmit just the first set of changes for saving a single run's logs, otherwise I can just submit the updated README for clarity. |
or separate files:
Does that cover the use-case?
Yes I agree this is super useful. Let's add that in a separate PR.
Yes let's break them down. Generally many smaller PRs is always better :) Thanks! |
The code adds the following:
Adds a
json-save-dir
arg to the load parser, so that JSON logs can be saved to diskAdds a
combine_logs
subcommand to the parser