Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: rocm-smi execution in the loop and polling the GPU every few seconds (parameter -l) #216

Open
vedranmiletic opened this issue Feb 27, 2025 · 1 comment

Comments

@vedranmiletic
Copy link

Suggestion Description

nvidia-smi supports-l <seconds> and -lms <milliseconds> parameters, which keep the tool running and poll the GPU for new information every <seconds> or <milliseconds> specified. Similar behavior can be achieved with watch -n <seconds> nvidia-smi, but -l <seconds> has less overhead on the measured system.

It would be nice to support this feature in rocm-smi as well.

Operating System

RHEL

GPU

MI300A

ROCm Component

rocm-smi

@dmitrii-galantsev
Copy link
Collaborator

We have this in amdsmi:
amd-smi monitor -w 1 ->

 TIMESTAMP  GPU  POWER  GPU_TEMP  MEM_TEMP  GFX_UTIL  GFX_CLOCK  MEM_UTIL  MEM_CLOCK  ENC_UTIL  DEC_UTIL    VCLOCK    DCLOCK  SINGLE_ECC  DOUBLE_ECC  PCIE_REPLAY  VRAM_USED  VRAM_TOTAL     PCIE_BW  PVIOL  TVIOL  TVIOL_ACTIVE  PHOT_TVIOL  VR_TVIOL  HBM_TVIOL  GFX_CLKVIOL
1740686195    0    8 W     28 °C     32 °C       0 %      0 MHz       0 %     96 MHz       N/A     0.0 %     0 MHz     0 MHz           0           0            0      16 MB    30679 MB    N/A Mb/s    N/A    N/A           N/A         N/A       N/A        N/A          N/A
1740686195    1    8 W     29 °C     30 °C       0 %      0 MHz       0 %     96 MHz       N/A     0.0 %     0 MHz     0 MHz           0           0            0      16 MB    30679 MB    N/A Mb/s    N/A    N/A           N/A         N/A       N/A        N/A          N/A


 TIMESTAMP  GPU  POWER  GPU_TEMP  MEM_TEMP  GFX_UTIL  GFX_CLOCK  MEM_UTIL  MEM_CLOCK  ENC_UTIL  DEC_UTIL    VCLOCK    DCLOCK  SINGLE_ECC  DOUBLE_ECC  PCIE_REPLAY  VRAM_USED  VRAM_TOTAL     PCIE_BW  PVIOL  TVIOL  TVIOL_ACTIVE  PHOT_TVIOL  VR_TVIOL  HBM_TVIOL  GFX_CLKVIOL
1740686196    0    8 W     28 °C     32 °C       0 %      0 MHz       0 %     96 MHz       N/A     0.0 %     0 MHz     0 MHz           0           0            0      16 MB    30679 MB    N/A Mb/s    N/A    N/A           N/A         N/A       N/A        N/A          N/A
1740686196    1    8 W     29 °C     30 °C       0 %      0 MHz       0 %     96 MHz       N/A     0.0 %     0 MHz     0 MHz           0           0            0      16 MB    30679 MB    N/A Mb/s    N/A    N/A           N/A         N/A       N/A        N/A          N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants