Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ollama + deepseek v2: The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device #12839

Open
stereomato opened this issue Feb 17, 2025 · 2 comments

Comments

@stereomato
Copy link

The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:4463

using this container, running on NixOS https://github.com/mattcurf/ollama-intel-gpu

podman build -t "ollama-intel-gpu" .

podman run --rm -p 127.0.0.1:11434:11434 -v /home/stereomato/models:/mnt -v ollama-volume:/root/.ollama -e OLLAMA_NUM_PARALLEL=1 -e OLLAMA_MAX_LOADED_MODELS=1 -e OLLAMA_FLASH_ATTENTION=1 -e OLLAMA_NUM_GPU=999 -e DEVICE=iGPU --device /dev/dri --name=ollama-intel-gpu

podman exec -it ollama-intel-gpu bash

./ollama pull deepseek-v2:16b, but the q4_k_m 16b also exhibits the same issue

./ollama run deepseek-v2 "hello deepseek"

Then, I get the error in the title/first two lines of this bug report.

HW:
Intel i5-12500h,
Intel Xe Graphics (Alder Lake)
24GB of RAM
up to date NixOS

@stereomato
Copy link
Author

nvm, this seems to be a memory limitation, derp. Is there a way to work around this?

@qiuxin2012
Copy link
Contributor

You can try to tune OLLAMA_NUM_GPU=999, like OLLAMA_NUM_GPU=18. It means put 18 layers on GPU, rest layers on CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants