Skip to content

Commit f2ed8d3

Browse files
tjtanaatanpinsiang
authored andcommitted
format math symbols to LaTeX
Signed-off-by: tjtanaa <[email protected]> Signed-off-by: tanpinsiang <[email protected]>
1 parent f9f7de3 commit f2ed8d3

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

_posts/2025-02-24-ptpc-fp8-rocm.md

+8-7
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ author: "AMD and Embedded LLM"
55
image: /assets/figures/ptpc/PTPC-tumbnail.png
66
thumbnail-img: /assets/figures/ptpc/PTPC-tumbnail.png
77
share-img: /assets/figures/ptpc/PTPC-tumbnail.png
8+
math: true
89
---
910

1011
**TL;DR**: vLLM on AMD ROCm now has better FP8 performance!
@@ -57,15 +58,15 @@ This insight led to a dual-granularity approach:
5758
The illustration shows two quantization approaches:
5859

5960
**Tensor Dimensions (Both Methods):**
60-
- **X**: Input activation tensor (T×Ci)
61-
- **W**: Weight tensor (Ci×Co)
62-
- **T**: Token sequence length
63-
- **Ci/Co**: Input/output channels
64-
- **\***: Matrix multiplication
61+
- **$X$**: Input activation tensor ($T \times C_i$)
62+
- **$W$**: Weight tensor ($C_i \times C_o$)
63+
- **$T$**: Token sequence length
64+
- **$C_i/C_o$**: Input/output channels
65+
- **$*$**: Matrix multiplication
6566

6667
**Scaling Factors:**
67-
- **Top (Per-Tensor)**: Single scalars ΔX[1] and ΔW[1] for entire tensors
68-
- **Bottom (PTPC)**: Vector ΔX[T×1] with one scale per token and ΔW[1×Co] with one scale per input channel
68+
- **Top (Per-Tensor)**: Single scalars $\Delta_X[1]$ and $\Delta_W[1]$ for entire tensors
69+
- **Bottom (PTPC)**: Vector $\Delta_X[T \times 1]$ with one scale per token and $\Delta_W[1 \times C_o]$ with one scale per input channel
6970

7071
This granular scaling approach allows PTPC-FP8 to achieve accuracy close to BF16 while maintaining the speed and memory benefits of 8-bit computation.
7172

0 commit comments

Comments
 (0)