Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup build images for Pytorch & Tensorflow with base ROCm image #557

Merged
merged 4 commits into from
Jun 27, 2024

Conversation

dibryant
Copy link
Contributor

Fixes for https://issues.redhat.com/browse/RHOAIENG-6377

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@openshift-ci openshift-ci bot requested review from atheo89 and jstourac June 12, 2024 23:06
Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dibryant dibryant force-pushed the amd branch 2 times, most recently from 268cd26 to 502eba2 Compare June 14, 2024 13:26
Copy link
Member

@jstourac jstourac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. I did very brief check and I've put some comments.

@harshad16 harshad16 changed the title Setup build images for Pytorch & Tensorflow with base ROCm image [WIP] Setup build images for Pytorch & Tensorflow with base ROCm image Jun 17, 2024
@dibryant dibryant force-pushed the amd branch 2 times, most recently from 6f6b251 to f64fef2 Compare June 17, 2024 20:26
Copy link
Member

@atheo89 atheo89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this work. I've added some comments to the files. Could you please take a look at them?

Additionally, could you add the .PHONY recipes to the Makefile? This would allow us to build these images locally as we do for the rest.

@atheo89
Copy link
Member

atheo89 commented Jun 20, 2024

Moreover, something that would be super useful for the PR review, but also as a complementary step to fulfill this work, is to set up the OCP-CI to build the newly created AMD PyTorch and TensorFlow notebooks.

@dibryant dibryant changed the title [WIP] Setup build images for Pytorch & Tensorflow with base ROCm image Setup build images for Pytorch & Tensorflow with base ROCm image Jun 20, 2024
@dibryant dibryant changed the title Setup build images for Pytorch & Tensorflow with base ROCm image [WIP]Setup build images for Pytorch & Tensorflow with base ROCm image Jun 20, 2024
@dibryant dibryant changed the title [WIP]Setup build images for Pytorch & Tensorflow with base ROCm image [WIP] Setup build images for Pytorch & Tensorflow with base ROCm image Jun 20, 2024
@dibryant dibryant force-pushed the amd branch 2 times, most recently from dae2558 to 7b44fbe Compare June 20, 2024 16:08
@jiridanek
Copy link
Member

Whoa, was wondering why the gha is failing, and it looks like it is running out of disk space

Total download size: 2.2 G
Installed size: 25 G
Downloading Packages:

Is it really necessary to install this much stuff? If yes, I'll get you more disk space tomorrow ;)

@jiridanek
Copy link
Member

@jiridanek
Copy link
Member

Whoa, was wondering why the gha is failing, and it looks like it is running out of disk space

Here you go

Copy link
Contributor

openshift-ci bot commented Jun 21, 2024

@dibryant: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebooks-e2e-tests 27ac281 link true /test notebooks-e2e-tests
ci/prow/runtimes-ubi9-e2e-tests 27ac281 link true /test runtimes-ubi9-e2e-tests
ci/prow/notebooks-ubi9-e2e-tests 27ac281 link true /test notebooks-ubi9-e2e-tests
ci/prow/codeserver-notebook-e2e-tests 27ac281 link true /test codeserver-notebook-e2e-tests
ci/prow/rstudio-notebook-e2e-tests 27ac281 link true /test rstudio-notebook-e2e-tests
ci/prow/intel-notebooks-e2e-tests 27ac281 link true /test intel-notebooks-e2e-tests
ci/prow/anaconda-ubi8-e2e-tests 27ac281 link true /test anaconda-ubi8-e2e-tests
ci/prow/runtimes-ubi8-e2e-tests 27ac281 link true /test runtimes-ubi8-e2e-tests
ci/prow/notebooks-ubi8-e2e-tests 27ac281 link true /test notebooks-ubi8-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@atheo89
Copy link
Member

atheo89 commented Jun 24, 2024

Adding this here as ref: openshift/release#53309

@harshad16 harshad16 changed the title [WIP] Setup build images for Pytorch & Tensorflow with base ROCm image Setup build images for Pytorch & Tensorflow with base ROCm image Jun 27, 2024
@harshad16 harshad16 added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 27, 2024
@harshad16
Copy link
Member

This is look good to me
i will be merging this for now.

The size is definitely as concern, we should take a deeper look at this.
and try to see, if this can be reduced.

Thanks all for the review and great work on this.
/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Jun 27, 2024
Copy link
Contributor

openshift-ci bot commented Jun 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harshad16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@harshad16 harshad16 merged commit 9f0a837 into opendatahub-io:main Jun 27, 2024
2 of 6 checks passed
@dibryant dibryant deleted the amd branch August 26, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants