Note for quantize DeepSeek model

Prerequisite

pip install -r requirements.txt

Usage

Step 1. quantize model weights

Option 1 (Recommended): Quantize weights directly

python quant.py --model_path /path/to/DeepSeek/R1/BF16/ --qmodel_path /path/to/DeepSeek/R1-Dynamic-FP8 --low_cpu_mem

Option 2: Load the model using transformers (requires ~700 GB of DRAM)

python quant.py --model_path /path/to/DeepSeek/R1/BF16/ --qmodel_path /path/to/DeepSeek/R1/Dynamic-FP8

Note

weight dtype is torch.float8_e4m3fn (full range is -448 to 448)
WEIGHT_BACKOFF = 0.5
SCALE_DTYPE = torch.bfloat16

Step 2. copy model files for inference

Since DeepSeek V3 and R1 are not yet supported by Transformers, we need to manually copy some model files.

python post_process.py --model_path /path/to/DeepSeek/R1/BF16/ --qmodel_path /path/to/DeepSeek/R1/Dynamic-FP8

More details

Name convention:
- weight scale name: prefix.scale_weight
- input scale name: prefix.scale_input (for static only)
A json file mapping from tensor name to safetensor file name.

class M(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.fc1 = torch.nn.Linear(10, 5, bias=False)

    def forward(self, inp):
        x1 = self.fc1(inp)
        return x1

1. state dict
{
    "fc1.weight": torch.Tensor(...),
    "fc1.scale_weight": torch.Tensor(...),
    "fc1.scale_input": torch.Tensor(...),
}

2. json file, `model.safetensors.index.json`
{
    "fc1.weight": "qmodel.safetensors",
    "fc1.scale_weight": "qmodel.safetensors",
    "fc1.scale_input": "qmodel.safetensors"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Note for quantize DeepSeek model

Prerequisite

Usage

Step 1. quantize model weights

Step 2. copy model files for inference

More details

Files

README.md

Latest commit

History

README.md

File metadata and controls

Note for quantize DeepSeek model

Prerequisite

Usage

Step 1. quantize model weights

Step 2. copy model files for inference

More details