Skip to content

Commit 136177d

Browse files
authored
[Benchmark] Prepare for execuTorch failure handling (#6391)
# Description Issue: #6294 Prepare mobile_job yml to generate benchmark record when job fails. ## Background: When a git benchmark job failed (or some of the mobile job failed), we need to generate a benchmark record to indicate that model has failures. For instace, a benchmark job with name:`benchmark-on-device (ic3, coreml_fp16, apple_iphone_15, arn:aws:devicefarm:us-west-2:308535385114... / mobile-job (ios) ` when the whole job failed, we want to indicate that the model ic3 with backend coreml_fp16 and IOS for all metrics is failed when one of the devices in job is failed, (IPHONE 15 with os 17.1), we want to indicate that the model ic3 with backend coreml_fp16 for IPHONE 15 with os 17.1 is failed, but others are success key: always generate the artifact json with git job name. ## Change Details - [yaml]add logic to generate artifact.json if any previous step fails and there is no expected artifact.json, this makes sure we always has the artifact json with git job name - [script] add a flag `--new-json-output-format` to toggle the mobile job to generate artifact.json with new format. - see example of new json result ([s3 link](https://gha-artifacts.s3.us-east-1.amazonaws.com/device_farm/13821036006/1/artifacts/ios-artifacts-38666170088.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUPVRELQNEU5O2WYP%2F20250312%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250312T212644Z&X-Amz-Expires=300&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEH4aCXVzLWVhc3QtMSJHMEUCIQC7%2BkVAOsGTimttLszL6u3N4HeFdSzwmPzlOYQBh%2BU%2BzwIgNjk%2FM73TZ9YfN6W92yjuRBUevYQ1BWWf0M7rmky4IT0q0AMIx%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwzMDg1MzUzODUxMTQiDCWs46GorlC4PkgCmCqkA7TQ41pTu7Pw2vUyPArSC95%2FUUHvRy5DCUEGOUwKmscwv%2B0D9jRdGfQ05E4dtVKliXhNnBRu2oH2u9WIPGKgR3fFjrVRvy2bzQhMYVjAqfUnG%2BhVO2hOKC6U33bMMNJ4SziagDSsAwHBRXl2YLsd9x4ToLubWcHFd4RtE5ZTFQFBHoB05KmzRJ5O00P6m%2BmzBvNh0T%2F2nj2l5c66VmBOe5xeyqEEHXsw3jD98NGrff7nQrONMDpRLjS74Hz%2Fz%2BGJL9RNwNQ2yJYSUdmkrTk4wi7ToNGrzpJm4Lh7wOprHQVwqpVnYaZjw7bJrTk4of4%2FE0%2FBsI1L3GqCxCt6kig02JKYBOy2nFNeRMR09xCSVQCvZE39zKZxrbilH%2FwBzHCS8KvqP14hhGbo%2F%2F08DWVBTZIgrQii0lNaPkB6c%2F0%2BCghTCQv1hUqhIY3avR3TquZzdZNeavNVU6is%2ByJtFpVZzCCH1AzeCRMcnJAlHdGyv9guD5q5wMpRICAihdmFnFy1LQZNAjSisMr0Z4zFfRKJzGdKSpdyL9D5O063WU0VVtmfI0U4fzCz38e%2BBjqEApAZr2cVZ87wIvVZOhcPBDmz%2F9mBgH5LSIK0bfkuZz6vhkUpJbmHbID6YjraMitF1ht1%2FgQtCQkHaejdA9y99K0KEwcT5JVEFaiJNhm5o7KvZJ1jlDqNAklD8brH63PQ705eszJeILnBAmKdOxTrqb83EEmg5Z2eSIjf7Cl04Si21S%2FZomsjHG1zlcHT4jZ9%2FzXPHNHFVmuMwqOVSTzMXx2BKHrOrtwW%2BbpQ8x8rOC5E9P85c86MSDefTk%2BC9Hoee16B45ywR%2BbH7I9fK%2FZ27v%2BCE0gHQglXCHTFVSp7mk18KQw67BJqq5nJDAQ%2BtEdezGj2O5iiG2Amto3XgUbeSRvTi7iF&X-Amz-Signature=49b1065e9246c807c434b8fd2dc510c014fb12a3ceb2605034da70ee2a64ca68&X-Amz-SignedHeaders=host&response-content-disposition=inline)) - [script] add git_job_name, run_report and job_reports to artifacts.json - git_job_name: used to build benchmark record if a git job failed [ a trick way to grab model info] - job_reports & run_report: we currently don't have extra info about mobile job concolusions, this can be used to upload to time_series or notification system for failure details. ## prs that simulate failure cases for generating logics Mimic step failed before the benchmark test (no json generated):#6397 Mimic step benchmark test failed but with artifact: #6398 ExecuTorch Sync Test: pytorch/executorch#9204 ## Details when the flag is on, artifact.json is converted from ``` [ .... ] ``` to ``` { "git_job_name": str "artifacts":[ ], "run_report":{} "job_reports":[....] } ``` This flag is temporary to in case the logics are in sync between repos.
1 parent 18aeaba commit 136177d

File tree

3 files changed

+120
-38
lines changed

3 files changed

+120
-38
lines changed

Diff for: .github/workflows/mobile_job.yml

+31-15
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,11 @@ on:
3434
description: The device pool associated with the project
3535
default: 'arn:aws:devicefarm:us-west-2::devicepool:082d10e5-d7d7-48a5-ba5c-b33d66efa1f5'
3636
type: string
37+
new-output-format-flag:
38+
description: experiment flag to enable the new artifact json format
39+
required: false
40+
default: false
41+
type: boolean
3742

3843
# Pulling test-infra itself for device farm runner script
3944
test-infra-repository:
@@ -310,7 +315,9 @@ jobs:
310315
RUN_ID: ${{ github.run_id }}
311316
RUN_ATTEMPT: ${{ github.run_attempt }}
312317
JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
318+
GIT_JOB_NAME: ${{ steps.get-job-id.outputs.job-name }}
313319
WORKING_DIRECTORY: test-infra/tools/device-farm-runner
320+
NEW_OUTPUT_FORMAT_FLAG: ${{ inputs.new-output-format-flag }}
314321
uses: nick-fields/[email protected]
315322
with:
316323
shell: bash
@@ -331,20 +338,11 @@ jobs:
331338
--name-prefix "${JOB_NAME}-${DEVICE_TYPE}" \
332339
--workflow-id "${RUN_ID}" \
333340
--workflow-attempt "${RUN_ATTEMPT}" \
334-
--output "ios-artifacts-${JOB_ID}.json"
341+
--output "ios-artifacts-${JOB_ID}.json" \
342+
--git-job-name "${GIT_JOB_NAME}" \
343+
--new-json-output-format "${NEW_OUTPUT_FORMAT_FLAG}"
335344
popd
336345
337-
- name: Upload iOS artifacts to S3
338-
uses: seemethere/upload-artifact-s3@v5
339-
if: always()
340-
with:
341-
retention-days: 14
342-
s3-bucket: gha-artifacts
343-
s3-prefix: |
344-
device_farm/${{ github.run_id }}/${{ github.run_attempt }}/artifacts
345-
path: |
346-
test-infra/tools/device-farm-runner/ios-artifacts-${{ steps.get-job-id.outputs.job-id }}.json
347-
348346
- name: Run Android tests on devices
349347
id: android-test
350348
if: ${{ inputs.device-type == 'android' }}
@@ -361,7 +359,9 @@ jobs:
361359
RUN_ID: ${{ github.run_id }}
362360
RUN_ATTEMPT: ${{ github.run_attempt }}
363361
JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
362+
GIT_JOB_NAME: ${{ steps.get-job-id.outputs.job-name }}
364363
WORKING_DIRECTORY: test-infra/tools/device-farm-runner
364+
NEW_OUTPUT_FORMAT_FLAG: ${{ inputs.new-output-format-flag }}
365365
uses: nick-fields/[email protected]
366366
with:
367367
shell: bash
@@ -382,10 +382,26 @@ jobs:
382382
--name-prefix "${JOB_NAME}-${DEVICE_TYPE}" \
383383
--workflow-id "${RUN_ID}" \
384384
--workflow-attempt "${RUN_ATTEMPT}" \
385-
--output "android-artifacts-${JOB_ID}.json"
385+
--output "android-artifacts-${JOB_ID}.json" \
386+
--git-job-name "${GIT_JOB_NAME}" \
387+
--new-json-output-format "${NEW_OUTPUT_FORMAT_FLAG}"
386388
popd
387389
388-
- name: Upload Android artifacts to S3
390+
- name: Check artifacts if any job fails
391+
if: failure()
392+
working-directory: test-infra/tools/device-farm-runner
393+
shell: bash
394+
env:
395+
DEVICE_TYPE: ${{ inputs.device-type }}
396+
BENCHMARK_OUTPUT: ${{ inputs.device-type }}-artifacts-${{ steps.get-job-id.outputs.job-id }}.json
397+
GIT_JOB_NAME: ${{ steps.get-job-id.outputs.job-name }}
398+
run: |
399+
if [[ ! -f "$BENCHMARK_OUTPUT" ]]; then
400+
echo "missing artifact json file for ${DEVICE_TYPE} with name ${BENCHMARK_OUTPUT}, generating ... "
401+
echo "{\"git_job_name\": \"$GIT_JOB_NAME\"}" >> "$BENCHMARK_OUTPUT"
402+
fi
403+
404+
- name: Upload artifacts to S3
389405
uses: seemethere/upload-artifact-s3@v5
390406
if: always()
391407
with:
@@ -394,4 +410,4 @@ jobs:
394410
s3-prefix: |
395411
device_farm/${{ github.run_id }}/${{ github.run_attempt }}/artifacts
396412
path: |
397-
test-infra/tools/device-farm-runner/android-artifacts-${{ steps.get-job-id.outputs.job-id }}.json
413+
test-infra/tools/device-farm-runner/${{ inputs.device-type }}-artifacts-${{ steps.get-job-id.outputs.job-id }}.json

Diff for: .github/workflows/test_mobile_job.yml

+37-2
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
device-type: ios
1919
# For iOS testing, the runner just needs to call AWS Device Farm, so there is no need to run this on macOS
2020
runner: ubuntu-latest
21-
# There values are prepared beforehand for the test
21+
# These values are prepared beforehand for the test
2222
project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:b531574a-fb82-40ae-b687-8f0b81341ae0
2323
device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:b531574a-fb82-40ae-b687-8f0b81341ae0/da5d902d-45db-477b-ae0a-766e06ef3845
2424
ios-ipa-archive: https://ossci-assets.s3.amazonaws.com/DeviceFarm.ipa
@@ -34,10 +34,45 @@ jobs:
3434
device-type: android
3535
runner: ubuntu-latest
3636
timeout: 120
37-
# There values are prepared beforehand for the test
37+
# These values are prepared beforehand for the test
3838
project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:b531574a-fb82-40ae-b687-8f0b81341ae0
3939
device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:b531574a-fb82-40ae-b687-8f0b81341ae0/bd86eb80-74a6-4511-8183-09aa66e3ccc4
4040
android-app-archive: https://ossci-assets.s3.amazonaws.com/app-debug.apk
4141
android-test-archive: https://ossci-assets.s3.amazonaws.com/app-debug-androidTest.apk
4242
test-spec: https://ossci-assets.s3.amazonaws.com/android-llm-device-farm-test-spec.yml
4343
extra-data: https://ossci-assets.s3.amazonaws.com/executorch-android-llama2-7b-0717.zip
44+
45+
test-ios-job-with-new-output-flag:
46+
permissions:
47+
id-token: write
48+
contents: read
49+
uses: ./.github/workflows/mobile_job.yml
50+
with:
51+
device-type: ios
52+
# For iOS testing, the runner just needs to call AWS Device Farm, so there is no need to run this on macOS
53+
runner: ubuntu-latest
54+
# These values are prepared beforehand for the test
55+
project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:b531574a-fb82-40ae-b687-8f0b81341ae0
56+
device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:b531574a-fb82-40ae-b687-8f0b81341ae0/da5d902d-45db-477b-ae0a-766e06ef3845
57+
ios-ipa-archive: https://ossci-assets.s3.amazonaws.com/DeviceFarm.ipa
58+
ios-xctestrun-zip: https://ossci-assets.s3.amazonaws.com/MobileNetClassifierTest_MobileNetClassifierTest_iphoneos17.4-arm64.xctestrun.zip
59+
test-spec: https://ossci-assets.s3.amazonaws.com/default-ios-device-farm-appium-test-spec.yml
60+
new-output-format-flag: true
61+
62+
test-android-llama2-job-with-new-output-flag:
63+
permissions:
64+
id-token: write
65+
contents: read
66+
uses: ./.github/workflows/mobile_job.yml
67+
with:
68+
device-type: android
69+
runner: ubuntu-latest
70+
timeout: 120
71+
# These values are prepared beforehand for the test
72+
project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:b531574a-fb82-40ae-b687-8f0b81341ae0
73+
device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:b531574a-fb82-40ae-b687-8f0b81341ae0/bd86eb80-74a6-4511-8183-09aa66e3ccc4
74+
android-app-archive: https://ossci-assets.s3.amazonaws.com/app-debug.apk
75+
android-test-archive: https://ossci-assets.s3.amazonaws.com/app-debug-androidTest.apk
76+
test-spec: https://ossci-assets.s3.amazonaws.com/android-llm-device-farm-test-spec.yml
77+
extra-data: https://ossci-assets.s3.amazonaws.com/executorch-android-llama2-7b-0717.zip
78+
new-output-format-flag: true

Diff for: tools/device-farm-runner/run_on_aws_devicefarm.py

+52-21
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,11 @@ def parse_args() -> Any:
195195
default=0,
196196
help="the workflow run attempt",
197197
)
198+
199+
parser.add_argument(
200+
"--git-job-name", type=str, required=True, help="the name of the git job name."
201+
)
202+
198203
parser.add_argument(
199204
"--output",
200205
type=str,
@@ -208,12 +213,19 @@ def parse_args() -> Any:
208213
)
209214

210215
parser.add_argument(
211-
"--new-json-output",
212-
action="store_true",
213-
help="enable new json artifact output format with jobrun, and list of artifacts, this is temporary ",
216+
"--new-json-output-format",
217+
type=str,
218+
choices=["true", "false"],
219+
default="false",
220+
required=False,
221+
help="enable new json artifact output format with mobile job reports and list of artifacts",
214222
)
215223

216-
return parser.parse_args()
224+
# in case when removing the flag, the mobile jobs does not failed due to unrecognized flag.
225+
args, unknown = parser.parse_known_args()
226+
if len(unknown) > 0:
227+
info(f"detected unknown flags: {unknown}")
228+
return args
217229

218230

219231
def upload_file(
@@ -409,6 +421,7 @@ class DeviceFarmReport:
409421
status: str
410422
result: str
411423
counters: Dict[str, str]
424+
app_type: str
412425
infos: Dict[str, str]
413426
parent_arn: str
414427

@@ -545,6 +558,7 @@ def _to_job_report(
545558
return JobReport(
546559
arn=arn,
547560
name=name,
561+
app_type=self.app_type,
548562
report_type=ReportType.JOB.value,
549563
status=status,
550564
result=result,
@@ -564,6 +578,7 @@ def _to_run_report(self, report: Dict[str, Any], infos: Dict[str, str] = dict())
564578
return DeviceFarmReport(
565579
name=name,
566580
arn=arn,
581+
app_type=self.app_type,
567582
report_type=ReportType.RUN.value,
568583
status=status,
569584
result=result,
@@ -661,7 +676,8 @@ def get_run_report(self):
661676
return DeviceFarmReport(
662677
name="",
663678
arn="",
664-
report_type="",
679+
app_type=self.app_type,
680+
report_type=ReportType.RUN.value,
665681
status="",
666682
result="",
667683
counters={},
@@ -699,9 +715,30 @@ def _upload_file_to_s3(self, file_name: str, bucket: str, key: str) -> None:
699715
)
700716

701717

718+
def generate_artifacts_output(
719+
artifacts: List[Dict[str, str]],
720+
run_report: DeviceFarmReport,
721+
job_reports: List[JobReport],
722+
git_job_name: str,
723+
):
724+
output = {
725+
"artifacts": artifacts,
726+
"run_report": asdict(run_report),
727+
"job_reports": [asdict(job_report) for job_report in job_reports],
728+
"git_job_name": git_job_name,
729+
}
730+
return output
731+
732+
702733
def main() -> None:
703734
args = parse_args()
704735

736+
# (TODO): remove this once remove the flag.
737+
if args.new_json_output_format == "true":
738+
info(f"use new json output format for {args.output}")
739+
else:
740+
info("use legacy json output format for {args.output}")
741+
705742
project_arn = args.project_arn
706743
name_prefix = args.name_prefix
707744
workflow_id = args.workflow_id
@@ -788,6 +825,11 @@ def main() -> None:
788825
time.sleep(30)
789826
except Exception as error:
790827
warn(f"Failed to run {unique_prefix}: {error}")
828+
# just use the new json output format
829+
json_file = {
830+
"git_job_name": args.git_job_name,
831+
}
832+
set_output(json.dumps(json_file), "artifacts", args.output)
791833
sys.exit(1)
792834
finally:
793835
info(f"Run {unique_prefix} finished with state {state} and result {result}")
@@ -797,10 +839,12 @@ def main() -> None:
797839
)
798840
artifacts = processor.start(r.get("run"))
799841

800-
if args.new_json_output:
801-
info("Generating new json output")
842+
if args.new_json_output_format == "true":
802843
output = generate_artifacts_output(
803-
artifacts, processor.get_run_report(), processor.get_job_reports()
844+
artifacts,
845+
processor.get_run_report(),
846+
processor.get_job_reports(),
847+
git_job_name=args.git_job_name,
804848
)
805849
set_output(json.dumps(output), "artifacts", args.output)
806850
else:
@@ -811,18 +855,5 @@ def main() -> None:
811855
sys.exit(1)
812856

813857

814-
def generate_artifacts_output(
815-
artifacts: List[Dict[str, str]],
816-
run_report: DeviceFarmReport,
817-
job_reports: List[JobReport],
818-
):
819-
output = {
820-
"artifacts": artifacts,
821-
"run_report": asdict(run_report),
822-
"job_reports": [asdict(job_report) for job_report in job_reports],
823-
}
824-
return output
825-
826-
827858
if __name__ == "__main__":
828859
main()

0 commit comments

Comments
 (0)