Skip to content

Latest commit

 

History

History
70 lines (52 loc) · 1.67 KB

README.md

File metadata and controls

70 lines (52 loc) · 1.67 KB

Note for quantize DeepSeek model

Prerequisite

pip install -r requirements.txt

Usage

Step 1. quantize model weights

  • Option 1 (Recommended): Quantize weights directly
python quant.py --model_path /path/to/DeepSeek/R1/BF16/ --qmodel_path /path/to/DeepSeek/R1-Dynamic-FP8 --low_cpu_mem
  • Option 2: Load the model using transformers (requires ~700 GB of DRAM)
python quant.py --model_path /path/to/DeepSeek/R1/BF16/ --qmodel_path /path/to/DeepSeek/R1/Dynamic-FP8

Note

  • weight dtype is torch.float8_e4m3fn (full range is -448 to 448)
  • WEIGHT_BACKOFF = 0.5
  • SCALE_DTYPE = torch.bfloat16

Step 2. copy model files for inference

Since DeepSeek V3 and R1 are not yet supported by Transformers, we need to manually copy some model files.

python post_process.py --model_path /path/to/DeepSeek/R1/BF16/ --qmodel_path /path/to/DeepSeek/R1/Dynamic-FP8

More details

  1. Name convention:
    • weight scale name: prefix.scale_weight
    • input scale name: prefix.scale_input (for static only)
  2. A json file mapping from tensor name to safetensor file name.
class M(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.fc1 = torch.nn.Linear(10, 5, bias=False)

    def forward(self, inp):
        x1 = self.fc1(inp)
        return x1
1. state dict
{
    "fc1.weight": torch.Tensor(...),
    "fc1.scale_weight": torch.Tensor(...),
    "fc1.scale_input": torch.Tensor(...),
}

2. json file, `model.safetensors.index.json`
{
    "fc1.weight": "qmodel.safetensors",
    "fc1.scale_weight": "qmodel.safetensors",
    "fc1.scale_input": "qmodel.safetensors"
}