[llm bench] Move calculation of memory consumption to memory_monitor tool #1937

sbalandi · 2025-03-18T20:04:43Z

memory_monitor.py from https://github.com/openvinotoolkit/nncf/blob/develop/tools/memory_monitor.py
added two custom lines, because of issue with tkiner, founded on text2image pipeline and stable-diffusion-v2-1 with pytorch framework :

import matplotlib
# CUSTOM FIX TO AVOID ISSUE: RuntimeError: main thread is not in main loop
matplotlib.use('Agg')

Task: CVS-162830 CVS-157590

sbalandi · 2025-03-18T20:28:41Z

to discuss. Is it okay that:

added delay as compilation and generation can sometimes be too fast and measure will be 0 in such cases: interval 0.01, delay 0.03
memory consumption is not included full memory which process consume, just memory, which was consumed by code snippet:
before, consumption from start + generate:
[warm-up][P0] Max rss memory cost: 5113.64MBytes
now, just generate:
[warm-up][P0] Max rss memory cost: 3991.55MBytes
In that case generation on next step after warm-up shows very low consumption :
before:
[1][P0] Max rss memory cost: 5124.81MBytes
now:
[1][P0] Max rss memory cost: 0.51MBytes

sbalandi · 2025-03-19T13:36:29Z

to discuss. Is it okay that:

added delay as compilation and generation can sometimes be too fast and measure will be 0 in such cases: interval 0.01, delay 0.03

memory consumption is not included full memory which process consume, just memory, which was consumed by code snippet:
before, consumption from start + generate:
[warm-up][P0] Max rss memory cost: 5113.64MBytes
now, just generate:
[warm-up][P0] Max rss memory cost: 3991.55MBytes
In that case generation on next step after warm-up shows very low consumption :
before:
[1][P0] Max rss memory cost: 5124.81MBytes
now:
[1][P0] Max rss memory cost: 0.51MBytes

delay is ok
keep printing full memory and add print of increase
move content of memory_profiling.py to memory_monitor.py

sbalandi · 2025-03-24T17:15:15Z

@eaidova could you please take a look ?

…tool

sbalandi · 2025-04-04T12:05:33Z

Also within this PR it was checked approach with multiprocessing.Process according to questions regarding execution time increase in collecting memory data mode. Approach shows total RSS memory increase(more 2x on compilation phase and ~1.5 on generation phase for tiny llama llm task). It can affect compilation/generation phase and lead to crashes due to out of memory limits. Statistics of run below:
python benchmark.py -m TinyLlama-1.1B-Chat-v1.0_new -d cpu -n 3 -mc 2
Current approach with threads:

Compilation phase: [ INFO ] Max rss memory cost for compilation phase: 666.27MiB, rss memory increase for copmpilation phase: 0.20MiB, max system memory cost for compilation phase: 95564.79MiB, system memory increase for copmpilation phase: 0.00MiB
Warm-up: [ INFO ] [warm-up][P0] Max rss memory cost: 4752.60MBytes, rss memory increase: 4012.00MBytes, max system memory memory cost: 97591.71MBytes, system memory increase: 2013.83MBytes
P0: [ INFO ] [1][P0] Max rss memory cost: 4762.75MBytes, rss memory increase: 9.42MBytes, max system memory memory cost: 97540.13MBytes, system memory increase: 11.97MBytes

Process (forkserver/spawn):

Compilation phase: [ INFO ] Max rss memory cost for compilation phase: 1996.17MiB; rss memory increase for copmpilation phase: 0.00MiB; max system memory cost for compilation phase: 69303.52MiB; system memory increase for copmpilation phase: 0.98MiB
Warm-up: [ INFO ] [warm-up][P0] Max rss memory cost: 6008.41MBytes, rss memory increase: 4008.58MBytes, max system memory memory cost: 71356.92MBytes, system memory increase: 2053.40MBytes
P0: [ INFO ] [1][P0] Max rss memory cost: 6017.34MBytes, rss memory increase: 8.68MBytes, max system memory memory cost: 71363.31MBytes, system memory increase: 6.39MBytes

github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Mar 18, 2025

sbalandi force-pushed the mem_mon branch 2 times, most recently from 04f7441 to e338fb6 Compare March 18, 2025 20:46

sbalandi requested a review from eaidova March 18, 2025 21:56

sbalandi marked this pull request as ready for review March 18, 2025 21:56

ilya-lavrenov assigned eaidova Mar 19, 2025

sbalandi force-pushed the mem_mon branch 2 times, most recently from 991a7f1 to f37e4ec Compare March 20, 2025 22:38

sbalandi force-pushed the mem_mon branch from f37e4ec to 7511b80 Compare March 25, 2025 12:14

sbalandi added 4 commits March 25, 2025 13:11

[llm bench] Move calculation of memory consumption to memory_monitor …

8f130c0

…tool

update

7511b80

fix interval

4dd834c

Merge branch 'master' into mem_mon

2604b1a

Merge branch 'master' into mem_mon

f9a49db

eaidova approved these changes Apr 15, 2025

View reviewed changes

eaidova merged commit 941e033 into openvinotoolkit:master Apr 15, 2025
52 of 54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm bench] Move calculation of memory consumption to memory_monitor tool #1937

[llm bench] Move calculation of memory consumption to memory_monitor tool #1937

sbalandi commented Mar 18, 2025 •

edited

Loading

sbalandi commented Mar 18, 2025

sbalandi commented Mar 19, 2025

sbalandi commented Mar 24, 2025

sbalandi commented Apr 4, 2025 •

edited

Loading

[llm bench] Move calculation of memory consumption to memory_monitor tool #1937

[llm bench] Move calculation of memory consumption to memory_monitor tool #1937

Conversation

sbalandi commented Mar 18, 2025 • edited Loading

sbalandi commented Mar 18, 2025

sbalandi commented Mar 19, 2025

sbalandi commented Mar 24, 2025

sbalandi commented Apr 4, 2025 • edited Loading

sbalandi commented Mar 18, 2025 •

edited

Loading

sbalandi commented Apr 4, 2025 •

edited

Loading