Skip to content

Record failed benchmark runs in the database #6294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
huydhn opened this issue Feb 14, 2025 · 3 comments
Open

Record failed benchmark runs in the database #6294

huydhn opened this issue Feb 14, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@huydhn
Copy link
Contributor

huydhn commented Feb 14, 2025

Atm, when a benchmark job failed, nothing will be uploaded to the database. However, this makes it difficult to differentiate between a failed benchmark run and no benchmark run because there are no record in both cases.

See the convo on #6277 (comment)

cc @yangw-dev

@huydhn
Copy link
Contributor Author

huydhn commented Feb 19, 2025

To fix this, we will need to keep that in a dedicate conclusion field. The metric can probably be set to empty (or 0). This approach will work with #6277 (comment) without any change on HUD

@yangw-dev
Copy link
Contributor

yangw-dev commented Feb 26, 2025

short-term, upload benchmark withextra_Info with conclusion field

@yangw-dev yangw-dev self-assigned this Mar 5, 2025
@yangw-dev yangw-dev moved this from Cold Storage to In Progress in PyTorch OSS Dev Infra Mar 6, 2025
@yangw-dev yangw-dev moved this to In Progress in ExecuTorch Benchmark Mar 6, 2025
yangw-dev added a commit that referenced this issue Mar 12, 2025
…ct (#6371)

#6294

Details:
- print `os` in test spect printout section
- store os, job_arn (mobile job) and job_conclusion to each artifact
metadata. Notice this is not github job conclusion, this is mobile job
conclusion.
- wrap post-test logics into ReportProcessor, 
    - pass aws client as parameter for test-driven purpose
    - add unit test for ReportProcessor
yangw-dev added a commit that referenced this issue Mar 13, 2025
# Description
Issue: #6294
Prepare mobile_job yml to generate benchmark record when job fails.

## Background:  
When a git benchmark job failed (or some of the mobile job failed), we
need to generate a benchmark record to indicate that model has failures.

For instace, a benchmark job with name:`benchmark-on-device (ic3,
coreml_fp16, apple_iphone_15,
arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) `
when the whole job failed, we want to indicate that the model ic3 with
backend coreml_fp16 and IOS for all metrics is failed
when one of the devices in job is failed, (IPHONE 15 with os 17.1), we
want to indicate that the model ic3 with backend coreml_fp16 for IPHONE
15 with os 17.1 is failed, but others are success

key: always generate the artifact json with git job name.

## Change Details
- [yaml]add logic to generate artifact.json if any previous step fails
and there is no expected artifact.json, this makes sure we always has
the artifact json with git job name
- [script] add a flag `--new-json-output-format` to toggle the mobile
job to generate artifact.json with new format.
- see example of new json result ([s3
link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline))
- [script] add git_job_name, run_report and job_reports to
artifacts.json
- git_job_name: used to build benchmark record if a git job failed [ a
trick way to grab model info]
- job_reports & run_report: we currently don't have extra info about
mobile job concolusions, this can be used to upload to time_series or
notification system for failure details.



## prs that simulate failure cases for generating logics
Mimic step failed before the benchmark test (no json
generated):#6397
Mimic step benchmark test failed but with artifact:
#6398
ExecuTorch Sync Test: pytorch/executorch#9204


## Details
when the flag is on, artifact.json is converted from 
```
[ 
   ....
]
```
to

```
{
   "git_job_name": str
    "artifacts":[ ],
    "run_report":{}
    "job_reports":[....]
}

```
This flag is temporary to in case the logics are in sync between repos.
Camyll pushed a commit that referenced this issue Mar 13, 2025
…ct (#6371)

#6294

Details:
- print `os` in test spect printout section
- store os, job_arn (mobile job) and job_conclusion to each artifact
metadata. Notice this is not github job conclusion, this is mobile job
conclusion.
- wrap post-test logics into ReportProcessor, 
    - pass aws client as parameter for test-driven purpose
    - add unit test for ReportProcessor
Camyll pushed a commit that referenced this issue Mar 13, 2025
# Description
Issue: #6294
Prepare mobile_job yml to generate benchmark record when job fails.

## Background:  
When a git benchmark job failed (or some of the mobile job failed), we
need to generate a benchmark record to indicate that model has failures.

For instace, a benchmark job with name:`benchmark-on-device (ic3,
coreml_fp16, apple_iphone_15,
arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) `
when the whole job failed, we want to indicate that the model ic3 with
backend coreml_fp16 and IOS for all metrics is failed
when one of the devices in job is failed, (IPHONE 15 with os 17.1), we
want to indicate that the model ic3 with backend coreml_fp16 for IPHONE
15 with os 17.1 is failed, but others are success

key: always generate the artifact json with git job name.

## Change Details
- [yaml]add logic to generate artifact.json if any previous step fails
and there is no expected artifact.json, this makes sure we always has
the artifact json with git job name
- [script] add a flag `--new-json-output-format` to toggle the mobile
job to generate artifact.json with new format.
- see example of new json result ([s3
link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline))
- [script] add git_job_name, run_report and job_reports to
artifacts.json
- git_job_name: used to build benchmark record if a git job failed [ a
trick way to grab model info]
- job_reports & run_report: we currently don't have extra info about
mobile job concolusions, this can be used to upload to time_series or
notification system for failure details.



## prs that simulate failure cases for generating logics
Mimic step failed before the benchmark test (no json
generated):#6397
Mimic step benchmark test failed but with artifact:
#6398
ExecuTorch Sync Test: pytorch/executorch#9204


## Details
when the flag is on, artifact.json is converted from 
```
[ 
   ....
]
```
to

```
{
   "git_job_name": str
    "artifacts":[ ],
    "run_report":{}
    "job_reports":[....]
}

```
This flag is temporary to in case the logics are in sync between repos.
yangw-dev added a commit to pytorch/executorch that referenced this issue Mar 13, 2025
Issue: pytorch/test-infra#6294
Remove benchmark v2 schema logics, still keep the way to store v3 with
v3 folder, since we might have higher version of schema in the future

next step is introduce the failure handling for benchmark record
@yangw-dev
Copy link
Contributor

Merged the backend side of script, now working on the UI

yangw-dev added a commit that referenced this issue Apr 3, 2025
# Description
Issue: #6294
Prepare mobile_job yml to generate benchmark record when job fails.

## Background:  
When a git benchmark job failed (or some of the mobile job failed), we
need to generate a benchmark record to indicate that model has failures.

For instace, a benchmark job with name:`benchmark-on-device (ic3,
coreml_fp16, apple_iphone_15,
arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) `
when the whole job failed, we want to indicate that the model ic3 with
backend coreml_fp16 and IOS for all metrics is failed
when one of the devices in job is failed, (IPHONE 15 with os 17.1), we
want to indicate that the model ic3 with backend coreml_fp16 for IPHONE
15 with os 17.1 is failed, but others are success

key: always generate the artifact json with git job name.

## Change Details
- [yaml]add logic to generate artifact.json if any previous step fails
and there is no expected artifact.json, this makes sure we always has
the artifact json with git job name
- [script] add a flag `--new-json-output-format` to toggle the mobile
job to generate artifact.json with new format.
- see example of new json result ([s3
link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline))
- [script] add git_job_name, run_report and job_reports to
artifacts.json
- git_job_name: used to build benchmark record if a git job failed [ a
trick way to grab model info]
- job_reports & run_report: we currently don't have extra info about
mobile job concolusions, this can be used to upload to time_series or
notification system for failure details.



## prs that simulate failure cases for generating logics
Mimic step failed before the benchmark test (no json
generated):#6397
Mimic step benchmark test failed but with artifact:
#6398
ExecuTorch Sync Test: pytorch/executorch#9204


## Details
when the flag is on, artifact.json is converted from 
```
[ 
   ....
]
```
to

```
{
   "git_job_name": str
    "artifacts":[ ],
    "run_report":{}
    "job_reports":[....]
}

```
This flag is temporary to in case the logics are in sync between repos.
@github-project-automation github-project-automation bot moved this from In Progress to Done in PyTorch OSS Dev Infra Apr 4, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in ExecuTorch Benchmark Apr 4, 2025
@yangw-dev yangw-dev reopened this Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Status: Done
Development

No branches or pull requests

3 participants