Record failed benchmark runs in the database #6294

huydhn · 2025-02-14T17:49:58Z

Atm, when a benchmark job failed, nothing will be uploaded to the database. However, this makes it difficult to differentiate between a failed benchmark run and no benchmark run because there are no record in both cases.

See the convo on #6277 (comment)

cc @yangw-dev

huydhn · 2025-02-19T03:57:16Z

To fix this, we will need to keep that in a dedicate conclusion field. The metric can probably be set to empty (or 0). This approach will work with #6277 (comment) without any change on HUD

yangw-dev · 2025-02-26T18:59:23Z

short-term, upload benchmark withextra_Info with conclusion field

…ct (#6371) #6294 Details: - print `os` in test spect printout section - store os, job_arn (mobile job) and job_conclusion to each artifact metadata. Notice this is not github job conclusion, this is mobile job conclusion. - wrap post-test logics into ReportProcessor, - pass aws client as parameter for test-driven purpose - add unit test for ReportProcessor

# Description Issue: #6294 Prepare mobile_job yml to generate benchmark record when job fails. ## Background: When a git benchmark job failed (or some of the mobile job failed), we need to generate a benchmark record to indicate that model has failures. For instace, a benchmark job with name:`benchmark-on-device (ic3, coreml_fp16, apple_iphone_15, arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) ` when the whole job failed, we want to indicate that the model ic3 with backend coreml_fp16 and IOS for all metrics is failed when one of the devices in job is failed, (IPHONE 15 with os 17.1), we want to indicate that the model ic3 with backend coreml_fp16 for IPHONE 15 with os 17.1 is failed, but others are success key: always generate the artifact json with git job name. ## Change Details - [yaml]add logic to generate artifact.json if any previous step fails and there is no expected artifact.json, this makes sure we always has the artifact json with git job name - [script] add a flag `--new-json-output-format` to toggle the mobile job to generate artifact.json with new format. - see example of new json result ([s3 link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline)) - [script] add git_job_name, run_report and job_reports to artifacts.json - git_job_name: used to build benchmark record if a git job failed [ a trick way to grab model info] - job_reports & run_report: we currently don't have extra info about mobile job concolusions, this can be used to upload to time_series or notification system for failure details. ## prs that simulate failure cases for generating logics Mimic step failed before the benchmark test (no json generated):#6397 Mimic step benchmark test failed but with artifact: #6398 ExecuTorch Sync Test: pytorch/executorch#9204 ## Details when the flag is on, artifact.json is converted from ``` [ .... ] ``` to ``` { "git_job_name": str "artifacts":[ ], "run_report":{} "job_reports":[....] } ``` This flag is temporary to in case the logics are in sync between repos.

…ct (#6371) #6294 Details: - print `os` in test spect printout section - store os, job_arn (mobile job) and job_conclusion to each artifact metadata. Notice this is not github job conclusion, this is mobile job conclusion. - wrap post-test logics into ReportProcessor, - pass aws client as parameter for test-driven purpose - add unit test for ReportProcessor

# Description Issue: #6294 Prepare mobile_job yml to generate benchmark record when job fails. ## Background: When a git benchmark job failed (or some of the mobile job failed), we need to generate a benchmark record to indicate that model has failures. For instace, a benchmark job with name:`benchmark-on-device (ic3, coreml_fp16, apple_iphone_15, arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) ` when the whole job failed, we want to indicate that the model ic3 with backend coreml_fp16 and IOS for all metrics is failed when one of the devices in job is failed, (IPHONE 15 with os 17.1), we want to indicate that the model ic3 with backend coreml_fp16 for IPHONE 15 with os 17.1 is failed, but others are success key: always generate the artifact json with git job name. ## Change Details - [yaml]add logic to generate artifact.json if any previous step fails and there is no expected artifact.json, this makes sure we always has the artifact json with git job name - [script] add a flag `--new-json-output-format` to toggle the mobile job to generate artifact.json with new format. - see example of new json result ([s3 link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline)) - [script] add git_job_name, run_report and job_reports to artifacts.json - git_job_name: used to build benchmark record if a git job failed [ a trick way to grab model info] - job_reports & run_report: we currently don't have extra info about mobile job concolusions, this can be used to upload to time_series or notification system for failure details. ## prs that simulate failure cases for generating logics Mimic step failed before the benchmark test (no json generated):#6397 Mimic step benchmark test failed but with artifact: #6398 ExecuTorch Sync Test: pytorch/executorch#9204 ## Details when the flag is on, artifact.json is converted from ``` [ .... ] ``` to ``` { "git_job_name": str "artifacts":[ ], "run_report":{} "job_reports":[....] } ``` This flag is temporary to in case the logics are in sync between repos.

Issue: pytorch/test-infra#6294 Remove benchmark v2 schema logics, still keep the way to store v3 with v3 folder, since we might have higher version of schema in the future next step is introduce the failure handling for benchmark record

yangw-dev · 2025-03-17T20:23:27Z

Merged the backend side of script, now working on the UI

# Description Issue: #6294 Prepare mobile_job yml to generate benchmark record when job fails. ## Background: When a git benchmark job failed (or some of the mobile job failed), we need to generate a benchmark record to indicate that model has failures. For instace, a benchmark job with name:`benchmark-on-device (ic3, coreml_fp16, apple_iphone_15, arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) ` when the whole job failed, we want to indicate that the model ic3 with backend coreml_fp16 and IOS for all metrics is failed when one of the devices in job is failed, (IPHONE 15 with os 17.1), we want to indicate that the model ic3 with backend coreml_fp16 for IPHONE 15 with os 17.1 is failed, but others are success key: always generate the artifact json with git job name. ## Change Details - [yaml]add logic to generate artifact.json if any previous step fails and there is no expected artifact.json, this makes sure we always has the artifact json with git job name - [script] add a flag `--new-json-output-format` to toggle the mobile job to generate artifact.json with new format. - see example of new json result ([s3 link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline)) - [script] add git_job_name, run_report and job_reports to artifacts.json - git_job_name: used to build benchmark record if a git job failed [ a trick way to grab model info] - job_reports & run_report: we currently don't have extra info about mobile job concolusions, this can be used to upload to time_series or notification system for failure details. ## prs that simulate failure cases for generating logics Mimic step failed before the benchmark test (no json generated):#6397 Mimic step benchmark test failed but with artifact: #6398 ExecuTorch Sync Test: pytorch/executorch#9204 ## Details when the flag is on, artifact.json is converted from ``` [ .... ] ``` to ``` { "git_job_name": str "artifacts":[ ], "run_report":{} "job_reports":[....] } ``` This flag is temporary to in case the logics are in sync between repos.

huydhn added this to PyTorch OSS Dev Infra Feb 14, 2025

huydhn mentioned this issue Feb 14, 2025

Filter out devices that are not run #6277

Merged

huydhn moved this to Cold Storage in PyTorch OSS Dev Infra Feb 17, 2025

guangy10 added this to ExecuTorch Benchmark Feb 18, 2025

guangy10 added the enhancement New feature or request label Feb 18, 2025

yangw-dev self-assigned this Mar 5, 2025

yangw-dev moved this from Cold Storage to In Progress in PyTorch OSS Dev Infra Mar 6, 2025

yangw-dev mentioned this issue Mar 6, 2025

[Mobile Benchmark Test] Add os, job_arn, and job_conclusion to artifact #6371

Merged

yangw-dev moved this to In Progress in ExecuTorch Benchmark Mar 6, 2025

yangw-dev mentioned this issue Mar 12, 2025

[Benchmark] Prepare for execuTorch failure handling #6391

Merged

yangw-dev mentioned this issue Mar 13, 2025

[Benchmark]Deprecate v2 pytorch/executorch#9238

Merged

yangw-dev mentioned this issue Mar 17, 2025

[Benchmark] Generate benchmark record for job failure pytorch/executorch#9247

Merged

yangw-dev closed this as completed Apr 4, 2025

github-project-automation bot moved this from In Progress to Done in PyTorch OSS Dev Infra Apr 4, 2025

github-project-automation bot moved this from In Progress to Done in ExecuTorch Benchmark Apr 4, 2025

yangw-dev reopened this Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record failed benchmark runs in the database #6294

Record failed benchmark runs in the database #6294

huydhn commented Feb 14, 2025 •

edited

Loading

huydhn commented Feb 19, 2025

yangw-dev commented Feb 26, 2025 •

edited

Loading

yangw-dev commented Mar 17, 2025

Record failed benchmark runs in the database #6294

Record failed benchmark runs in the database #6294

Comments

huydhn commented Feb 14, 2025 • edited Loading

huydhn commented Feb 19, 2025

yangw-dev commented Feb 26, 2025 • edited Loading

yangw-dev commented Mar 17, 2025

huydhn commented Feb 14, 2025 •

edited

Loading

yangw-dev commented Feb 26, 2025 •

edited

Loading