2 x A770 with Ollama Linux , inference responses slow down dramatically #12852

RobinJing · 2025-02-19T07:04:40Z

Hi,
I use the latest cpp docker and the latest i915 dkms driver on Linux, inference with 2x A770 and Ollama, the model is Deepseek 32B int4. However, after 2 or 3 times we get 15~20 tokens, the rest of time we only get 6-8 tokens/s, also, the speed is very strange, it looks quite fast at the beginning, then drops dramatically after like 50-100 tokens, and sometimes it speeds up after 10-20 seconds..

Also, the gap between ollama and vllm is 5x times on the same machine, which is typically a bug performance.
BR

hkvision assigned sgwhat Feb 20, 2025

hkvision added the user issue label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 x A770 with Ollama Linux , inference responses slow down dramatically #12852

2 x A770 with Ollama Linux , inference responses slow down dramatically #12852

RobinJing commented Feb 19, 2025

2 x A770 with Ollama Linux , inference responses slow down dramatically #12852

2 x A770 with Ollama Linux , inference responses slow down dramatically #12852

Comments

RobinJing commented Feb 19, 2025