Does AirLLM Support Running Quantized Models (e.g., unsloth/Qwen2-72B-bnb-4bit)? #213

NEWbie0709 · 2025-02-24T04:29:09Z

Does AirLLM currently support running 4-bit quantized models like unsloth/Qwen2-72B-bnb-4bit? I’m trying to load and run this model using AirLLM, but I’m encountering the following error during generation:

RuntimeError: Attempted to call variable.set_data(tensor), but variable and tensor have incompatible tensor type.

Other than that, I also tried using the smaller version of Qwen, for example: Qwen/Qwen2.5-0.5B, but I encountered this error.

AssertionError: model.safetensors.index.json should exist

The text was updated successfully, but these errors were encountered:

NEWbie0709 · 2025-02-26T07:51:32Z

i tried running with Qwen-72B-instruct and this is the error i got

NEWbie0709 mentioned this issue Feb 26, 2025

taking about 40 minutes to generate one sentence，Is this speed normal? #186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does AirLLM Support Running Quantized Models (e.g., unsloth/Qwen2-72B-bnb-4bit)? #213

Does AirLLM Support Running Quantized Models (e.g., unsloth/Qwen2-72B-bnb-4bit)? #213

NEWbie0709 commented Feb 24, 2025 •

edited

Loading

NEWbie0709 commented Feb 26, 2025

Does AirLLM Support Running Quantized Models (e.g., unsloth/Qwen2-72B-bnb-4bit)? #213

Does AirLLM Support Running Quantized Models (e.g., unsloth/Qwen2-72B-bnb-4bit)? #213

Comments

NEWbie0709 commented Feb 24, 2025 • edited Loading

NEWbie0709 commented Feb 26, 2025

NEWbie0709 commented Feb 24, 2025 •

edited

Loading