Skip to content

[llm bench] Move calculation of memory consumption to memory_monitor tool #1937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 15, 2025

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Mar 18, 2025

memory_monitor.py from https://github.com/openvinotoolkit/nncf/blob/develop/tools/memory_monitor.py
added two custom lines, because of issue with tkiner, founded on text2image pipeline and stable-diffusion-v2-1 with pytorch framework :

import matplotlib
# CUSTOM FIX TO AVOID ISSUE: RuntimeError: main thread is not in main loop
matplotlib.use('Agg')

Task: CVS-162830 CVS-157590

@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Mar 18, 2025
@sbalandi
Copy link
Contributor Author

to discuss. Is it okay that:

  • added delay as compilation and generation can sometimes be too fast and measure will be 0 in such cases: interval 0.01, delay 0.03
  • memory consumption is not included full memory which process consume, just memory, which was consumed by code snippet:
    before, consumption from start + generate:
    [warm-up][P0] Max rss memory cost: 5113.64MBytes
    now, just generate:
    [warm-up][P0] Max rss memory cost: 3991.55MBytes
    In that case generation on next step after warm-up shows very low consumption :
    before:
    [1][P0] Max rss memory cost: 5124.81MBytes
    now:
    [1][P0] Max rss memory cost: 0.51MBytes

@sbalandi sbalandi force-pushed the mem_mon branch 2 times, most recently from 04f7441 to e338fb6 Compare March 18, 2025 20:46
@sbalandi sbalandi requested a review from eaidova March 18, 2025 21:56
@sbalandi sbalandi marked this pull request as ready for review March 18, 2025 21:56
@sbalandi
Copy link
Contributor Author

to discuss. Is it okay that:

  • added delay as compilation and generation can sometimes be too fast and measure will be 0 in such cases: interval 0.01, delay 0.03
  • memory consumption is not included full memory which process consume, just memory, which was consumed by code snippet:
    before, consumption from start + generate:
    [warm-up][P0] Max rss memory cost: 5113.64MBytes
    now, just generate:
    [warm-up][P0] Max rss memory cost: 3991.55MBytes
    In that case generation on next step after warm-up shows very low consumption :
    before:
    [1][P0] Max rss memory cost: 5124.81MBytes
    now:
    [1][P0] Max rss memory cost: 0.51MBytes
  • delay is ok
  • keep printing full memory and add print of increase
  • move content of memory_profiling.py to memory_monitor.py

@sbalandi sbalandi force-pushed the mem_mon branch 2 times, most recently from 991a7f1 to f37e4ec Compare March 20, 2025 22:38
@sbalandi
Copy link
Contributor Author

@eaidova could you please take a look ?

@sbalandi
Copy link
Contributor Author

sbalandi commented Apr 4, 2025

Also within this PR it was checked approach with multiprocessing.Process according to questions regarding execution time increase in collecting memory data mode. Approach shows total RSS memory increase(more 2x on compilation phase and ~1.5 on generation phase for tiny llama llm task). It can affect compilation/generation phase and lead to crashes due to out of memory limits. Statistics of run below:
python benchmark.py -m TinyLlama-1.1B-Chat-v1.0_new -d cpu -n 3 -mc 2
Current approach with threads:

Compilation phase: [ INFO ] Max rss memory cost for compilation phase: 666.27MiB, rss memory increase for copmpilation phase: 0.20MiB, max system memory cost for compilation phase: 95564.79MiB, system memory increase for copmpilation phase: 0.00MiB
Warm-up: [ INFO ] [warm-up][P0] Max rss memory cost: 4752.60MBytes, rss memory increase: 4012.00MBytes, max system memory memory cost: 97591.71MBytes, system memory increase: 2013.83MBytes
P0: [ INFO ] [1][P0] Max rss memory cost: 4762.75MBytes, rss memory increase: 9.42MBytes, max system memory memory cost: 97540.13MBytes, system memory increase: 11.97MBytes

Process (forkserver/spawn):

Compilation phase: [ INFO ] Max rss memory cost for compilation phase: 1996.17MiB; rss memory increase for copmpilation phase: 0.00MiB; max system memory cost for compilation phase: 69303.52MiB; system memory increase for copmpilation phase: 0.98MiB
Warm-up: [ INFO ] [warm-up][P0] Max rss memory cost: 6008.41MBytes, rss memory increase: 4008.58MBytes, max system memory memory cost: 71356.92MBytes, system memory increase: 2053.40MBytes
P0: [ INFO ] [1][P0] Max rss memory cost: 6017.34MBytes, rss memory increase: 8.68MBytes, max system memory memory cost: 71363.31MBytes, system memory increase: 6.39MBytes

@eaidova eaidova merged commit 941e033 into openvinotoolkit:master Apr 15, 2025
52 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants