IGPU limits the inference speed of the entire system #12828

dttprofessor · 2025-02-14T17:31:48Z

System: U265K+48G ddr5 +B580
ENV: Run Ollama Portable Zip on Intel GPU with IPEX-LLM
GPU drive：6559

Question:IGPU limits the inference speed of the entire system

ID	Device Type	Name	Version	units	group	group	size	Driver version
0	[level_zero:gpu:0]	Intel Graphics	12.70	64	1024	32	26769M	1.6.31441
1	[level_zero:gpu:1]	Intel Arc B580 Graphics	20.1	160	1024	32	12450M	1.6.3

1/ When i load deepseek-r1:7b , igpu loaded 4g, B580 loaded 3.2g, IGPU limits the inference speed of the entire system.

2/When i load deepseek-r1:32b , igpu loaded 15.7g, B580 loaded 8.4g, CPU don't work.

3/ When i shut off igpu & load deepseek-r1:32b,B580 loaded 25g, CPU don't work. The large model is stuck and cannot perform inference.

sgwhat · 2025-02-17T01:38:43Z

Set ONEAPI_DEVICE_SELECTOR="level_zero:1" to enable B580 only could be helpful for the inference speed. But I don't think B580 has enough VRAM to load deepseek-r1:32b.

dttprofessor · 2025-02-17T13:46:27Z

Set ONEAPI_DEVICE_SELECTOR="level_zero:1" to enable B580 only could be helpful for the inference speed. But I don't think B580 has enough VRAM to load deepseek-r1:32b.将 ONEAPI_DEVICE_SELECTOR="level_zero:1" 设置为仅启用 B580 可能有助于推理速度。但我不认为 B580 有足够的 VRAM 来加载 deepseek-r1:32b 。

Isn't SYCL the default CPU+GPU hybrid inference? Need to set it up manually?

sgwhat · 2025-02-18T02:15:44Z

Set ONEAPI_DEVICE_SELECTOR="level_zero:1" to enable B580 only could be helpful for the inference speed. But I don't think B580 has enough VRAM to load deepseek-r1:32b.将 ONEAPI_DEVICE_SELECTOR="level_zero:1" 设置为仅启用 B580 可能有助于推理速度。但我不认为 B580 有足够的 VRAM 来加载 deepseek-r1:32b 。

Isn't SYCL the default CPU+GPU hybrid inference? Need to set it up manually?

No it's not, and CPU+B580 hybrid could be slower than iGPU+B580.

dttprofessor · 2025-02-18T08:02:07Z

Set ONEAPI_DEVICE_SELECTOR="level_zero:1" to enable B580 only could be helpful for the inference speed. But I don't think B580 has enough VRAM to load deepseek-r1:32b.将 ONEAPI_DEVICE_SELECTOR="level_zero:1" 设置为仅启用 B580 可能有助于推理速度。但我不认为 B580 有足够的 VRAM 来加载 deepseek-r1:32b 。将 ONEAPI_DEVICE_SELECTOR="level_zero:1" 设置为仅启用 B580 可能有助于推理速度。但我不认为 B580 有足够的 VRAM 来加载 deepseek-r1:32b 。

Isn't SYCL the default CPU+GPU hybrid inference? Need to set it up manually?SYCL 是不是默认的 CPU+GPU 混合推理？需要手动设置吗？

No it's not, and CPU+B580 hybrid could be slower than iGPU+B580.不是的，CPU+B580 混合可能比 iGPU+B580 慢。

ok！
However, when I turned off IGBU, all the 32B models were loaded on b580, and the CPU was useless at all, and the model became almost unavailable.

sgwhat · 2025-02-19T01:50:58Z

The VRAM of the B580 is insufficient to load a 32B model. You can continue running the model using the iGPU + B580.

qiuxin2012 added the user issue label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IGPU limits the inference speed of the entire system #12828

IGPU limits the inference speed of the entire system #12828

dttprofessor commented Feb 14, 2025 •

edited

Loading

sgwhat commented Feb 17, 2025

dttprofessor commented Feb 17, 2025

sgwhat commented Feb 18, 2025

dttprofessor commented Feb 18, 2025

sgwhat commented Feb 19, 2025

IGPU limits the inference speed of the entire system #12828

IGPU limits the inference speed of the entire system #12828

Comments

dttprofessor commented Feb 14, 2025 • edited Loading

sgwhat commented Feb 17, 2025

dttprofessor commented Feb 17, 2025

sgwhat commented Feb 18, 2025

dttprofessor commented Feb 18, 2025

sgwhat commented Feb 19, 2025

dttprofessor commented Feb 14, 2025 •

edited

Loading