[Bug] Incorrect default argument and KeyError for llama #62

ZisIsNotZis · 2025-03-04T09:30:02Z

Currently, if kv_quant_granularity not given:

from omniserve import LLMEngine, EngineArgs
M = 'mit-han-lab/Llama-3-8B-Instruct-QServe-g128'
M = LLMEngine.from_engine_args(EngineArgs(M, ifb_mode=1, precision='w4a8kv4', quant_path=M))

results in

NotImplementedError: Unsupported kv_quant_granularity None

After adding one:

from omniserve import LLMEngine, EngineArgs
M = 'mit-han-lab/Llama-3-8B-Instruct-QServe-g128'
M = LLMEngine.from_engine_args(EngineArgs(M, ifb_mode=1, precision='w4a8kv4', quant_path=M, kv_quant_granularity='per_tensor'))

still results in

KeyError: 'model.layers.0.self_attn.qkv_proj.s2_scales'

I tried llama and mistral, non of them work. Maybe the model uploaded and readme are already too outdated?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Incorrect default argument and KeyError for llama #62

[Bug] Incorrect default argument and KeyError for llama #62

ZisIsNotZis commented Mar 4, 2025 •

edited

Loading

[Bug] Incorrect default argument and KeyError for llama #62

[Bug] Incorrect default argument and KeyError for llama #62

Comments

ZisIsNotZis commented Mar 4, 2025 • edited Loading

ZisIsNotZis commented Mar 4, 2025 •

edited

Loading