You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I use the latest cpp docker and the latest i915 dkms driver on Linux, inference with 2x A770 and Ollama, the model is Deepseek 32B int4. However, after 2 or 3 times we get 15~20 tokens, the rest of time we only get 6-8 tokens/s, also, the speed is very strange, it looks quite fast at the beginning, then drops dramatically after like 50-100 tokens, and sometimes it speeds up after 10-20 seconds..
Also, the gap between ollama and vllm is 5x times on the same machine, which is typically a bug performance.
BR
The text was updated successfully, but these errors were encountered:
Hi,
I use the latest cpp docker and the latest i915 dkms driver on Linux, inference with 2x A770 and Ollama, the model is Deepseek 32B int4. However, after 2 or 3 times we get 15~20 tokens, the rest of time we only get 6-8 tokens/s, also, the speed is very strange, it looks quite fast at the beginning, then drops dramatically after like 50-100 tokens, and sometimes it speeds up after 10-20 seconds..
Also, the gap between ollama and vllm is 5x times on the same machine, which is typically a bug performance.
BR
The text was updated successfully, but these errors were encountered: