Added support of training with NANOO fp8 GEMM on AMD MI300/MI325 GPUs. #1262

wenchenvincent · 2025-02-11T05:17:16Z

Description

This PR added support of training with NANOO fp8 GEMM on AMD MI300/MI325 GPUs.

There are several different genres of fp8 formats used by different HW vendors. Two popular genres include

OCP fp8, which is used natively on NVIDIA H100.
NANOO fp8, which is used natively on AMD MI300/MI325.

These two genres of fp8 formats work very similarly. The support of training with NANOO fp8 GEMM in Maxtext is based on this PR in Flax: google/flax#3993

References:

OCP fp8 paper: https://arxiv.org/abs/2209.05433
NANOO fp8 paper: https://arxiv.org/abs/2206.02915
JAX PR: jax-ml/jax#21376
XLA PR: openxla/xla#9531
Flax PR: google/flax#3993

Tests

I had run llama2-7b with NANOO fp8 from this PR and verified it was functional and the loss went down quickly with the synthetic dataset. I was not able to run the full unit tests locally due to the requirement of Google Cloud API credentials.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

anfals

LGTM; also using FP8Ops but just with different types; should work the same as everything else we have with _overwrite_with_gradient, so no additional lift needed

Added support of training with NANOO fp8 GEMM on AMD MI300/MI325 GPUs.

9336c48

shralex requested a review from yangyuwei February 20, 2025 19:37

anfals approved these changes Feb 21, 2025

View reviewed changes

Fixed minor pylint issue.

4829834

wenchenvincent marked this pull request as ready for review February 21, 2025 06:27

wenchenvincent requested review from gobbleturk, khatwanimohit, bvandermoon, vipannalla and RissyRan as code owners February 21, 2025 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support of training with NANOO fp8 GEMM on AMD MI300/MI325 GPUs. #1262

Added support of training with NANOO fp8 GEMM on AMD MI300/MI325 GPUs. #1262

wenchenvincent commented Feb 11, 2025 •

edited

Loading

anfals left a comment

Added support of training with NANOO fp8 GEMM on AMD MI300/MI325 GPUs. #1262

Are you sure you want to change the base?

Added support of training with NANOO fp8 GEMM on AMD MI300/MI325 GPUs. #1262

Conversation

wenchenvincent commented Feb 11, 2025 • edited Loading

Description

Tests

Checklist

anfals left a comment

Choose a reason for hiding this comment

wenchenvincent commented Feb 11, 2025 •

edited

Loading