Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llm bench] Move calculation of memory consumption to memory_monitor tool #1937

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Mar 18, 2025

memory_monitor.py from https://github.com/openvinotoolkit/nncf/blob/develop/tools/memory_monitor.py
added two custom lines, because of issue with tkiner, founded on text2image pipeline and stable-diffusion-v2-1 with pytorch framework :

import matplotlib
# CUSTOM FIX TO AVOID ISSUE: RuntimeError: main thread is not in main loop
matplotlib.use('Agg')

Task: CVS-164392 CVS-157590

@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Mar 18, 2025
@sbalandi
Copy link
Contributor Author

to discuss. Is it okay that:

  • added delay as compilation and generation can sometimes be too fast and measure will be 0 in such cases: interval 0.01, delay 0.03
  • memory consumption is not included full memory which process consume, just memory, which was consumed by code snippet:
    before, consumption from start + generate:
    [warm-up][P0] Max rss memory cost: 5113.64MBytes
    now, just generate:
    [warm-up][P0] Max rss memory cost: 3991.55MBytes
    In that case generation on next step after warm-up shows very low consumption :
    before:
    [1][P0] Max rss memory cost: 5124.81MBytes
    now:
    [1][P0] Max rss memory cost: 0.51MBytes

@sbalandi sbalandi force-pushed the mem_mon branch 2 times, most recently from 04f7441 to e338fb6 Compare March 18, 2025 20:46
@sbalandi sbalandi requested a review from eaidova March 18, 2025 21:56
@sbalandi sbalandi marked this pull request as ready for review March 18, 2025 21:56
@sbalandi
Copy link
Contributor Author

to discuss. Is it okay that:

  • added delay as compilation and generation can sometimes be too fast and measure will be 0 in such cases: interval 0.01, delay 0.03
  • memory consumption is not included full memory which process consume, just memory, which was consumed by code snippet:
    before, consumption from start + generate:
    [warm-up][P0] Max rss memory cost: 5113.64MBytes
    now, just generate:
    [warm-up][P0] Max rss memory cost: 3991.55MBytes
    In that case generation on next step after warm-up shows very low consumption :
    before:
    [1][P0] Max rss memory cost: 5124.81MBytes
    now:
    [1][P0] Max rss memory cost: 0.51MBytes
  • delay is ok
  • keep printing full memory and add print of increase
  • move content of memory_profiling.py to memory_monitor.py

@sbalandi sbalandi force-pushed the mem_mon branch 2 times, most recently from 991a7f1 to f37e4ec Compare March 20, 2025 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants