We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 1adf995 commit ec5febbCopy full SHA for ec5febb
torchao/float8/README.md
@@ -215,7 +215,7 @@ and tensorwise scaling. The training benchmarks were all run using:
215
- `torch.compile`
216
- FSDP2
217
218
-| Model | Scaling | Activation checkpointing | Peak Memory (GB) | Median tokens/second | Speedup over basline
+| Model | Scaling | Activation checkpointing | Peak Memory (GB) | Median tokens/second | Speedup over baseline
219
| ------------- | ------------ | ------------------------ | ------------------| -------------------- | ---------------------
220
| Llama3-8b | none (bf16) | per op SAC | 47.65 | 6019 | -
221
| Llama3-8b | tensorwise | per op SAC | 47.77 | 7190 | 19.45%
0 commit comments