Skip to content

Commit

Permalink
fix comment
Browse files Browse the repository at this point in the history
  • Loading branch information
GeeeekExplorer authored Feb 7, 2025
1 parent 1d7d440 commit 5ee97a8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion inference/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def linear(x: torch.Tensor, weight: torch.Tensor, bias: Optional[torch.Tensor] =
quantization-aware computations depending on the input parameters.
Notes:
- If `weight` is quantized (e.g., `element_size() > 1`), a dequantized version
- If `weight` is quantized (e.g., `element_size() == 1`), a dequantized version
is used for computation.
- If `gemm_impl == "bf16"`, dequantization and a `bf16` GEMM operation are applied.
- For other cases, the function applies quantization to `x` and uses `fp8_gemm` for computation.
Expand Down

0 comments on commit 5ee97a8

Please sign in to comment.