-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGPU limits the inference speed of the entire system #12828
Comments
Set |
Isn't SYCL the default CPU+GPU hybrid inference? Need to set it up manually? |
No it's not, and CPU+B580 hybrid could be slower than iGPU+B580. |
ok! |
The VRAM of the B580 is insufficient to load a 32B model. You can continue running the model using the iGPU + B580. |
System: U265K+48G ddr5 +B580
ENV: Run Ollama Portable Zip on Intel GPU with IPEX-LLM
GPU drive:6559
Question:IGPU limits the inference speed of the entire system
1/ When i load deepseek-r1:7b , igpu loaded 4g, B580 loaded 3.2g, IGPU limits the inference speed of the entire system.
2/When i load deepseek-r1:32b , igpu loaded 15.7g, B580 loaded 8.4g, CPU don't work.
3/ When i shut off igpu & load deepseek-r1:32b,B580 loaded 25g, CPU don't work. The large model is stuck and cannot perform inference.
The text was updated successfully, but these errors were encountered: