Weight Protective Quantization Range -119, 119 #60

Jzz24 · 2025-02-26T10:23:01Z

https://github.com/mit-han-lab/omniserve/blob/main/omniserve/modeling/layers/quantized_linear/w4a8_linear.py#L176, it seems we do not quantize the int8 w to range [-119, 119]? And how to caculate the s1_scale? just like the int8 quantization? but use qmin=-119, qmax=119?

ys-2020 · 2025-02-26T22:07:33Z

Hi. Thanks for your interests in QServe. The protective range 119 has already be considered and utilized during the model quantization process. When computing the s1_scale for int8 quantization, we use 119 for the scaling factor computation.

Jzz24 · 2025-02-28T08:40:53Z

get it, thx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight Protective Quantization Range -119, 119 #60

Weight Protective Quantization Range -119, 119 #60

Jzz24 commented Feb 26, 2025

ys-2020 commented Feb 26, 2025

Jzz24 commented Feb 28, 2025

Weight Protective Quantization Range -119, 119 #60

Weight Protective Quantization Range -119, 119 #60

Comments

Jzz24 commented Feb 26, 2025

ys-2020 commented Feb 26, 2025

Jzz24 commented Feb 28, 2025