GPU Mapping #4326

WeiqunZhang · 2025-02-07T21:11:33Z

For perlmutter and frontier, if there are multiple devices available, we will try to map GPUs to the closest core.

For an FFT test on perlmutter using 256 nodes, the correct mapping reduced the run time from 0.172 to 0.127. Note that you can achieve the similar effect with srun ... bash -c "export CUDA_VISIBLE_DEVICES=\$((3-SLURM_LOCALID)); ..." by manually limiting the number of visible devices. But in this commit, we are trying to do this automatically for the user. Also note that MPI appears to crash with gpu-bind=closest on perlmutter. So we need to use gpu-bind=none.

For frontier, you could use gpu-bind=closest. But if your use gpu-bind=none, this commit will try to do the correct mapping for you.

In this commit, we also removed the old machine stuff and added new code for machine detection.

ax3l · 2025-02-14T19:24:18Z

Src/Base/AMReX_GpuDevice.cpp

+            if ((Machine::name() != "nersc.perlmutter") &&
+                (Machine::name() != "olcf.frontier"))
+            {
+                amrex::Warning("Multiple GPUs are visible to each MPI rank. This is usually not an issue. But this may lead to incorrect or suboptimal rank-to-GPU mapping.");


I think with the implementation below, we need to be more precise now: for the machines we implement logic for, we should post something like: "Fixing GPU assignment for Frontier according to heuristics..." or so?

For perlmutter and frontier, if there are multiple device available, we will try to map GPUs to the closest core. For an FFT test on perlmutter using 256 nodes, the correct mapping reduced the run time from 0.172 to 0.127. Note that you can achieve the similar effect with `srun ... bash -c "export CUDA_VISIBLE_DEVICES=\$((3-SLURM_LOCALID)); ..."` by manually limiting the number of visible devices. But in this commit, we are trying to do this automatically for the user. Also note that MPI appears to crash with gpu-bind=closest on perlmutter. So we need to use gpu-bind=none. For frontier, you could use gpu-bind=closest. But if your use gpu-bind=none, this commit will try to do the correct mapping for you. In this commit, we also removed the old machin stuff and added new code for machine detection.

ax3l

Thank you! LGTM 👍

WeiqunZhang requested a review from atmyers February 7, 2025 21:14

ax3l reviewed Feb 14, 2025

View reviewed changes

WeiqunZhang added 2 commits February 14, 2025 12:03

Warning message

947955a

WeiqunZhang force-pushed the perlmutter_gpu_mapping branch from d779130 to 947955a Compare February 14, 2025 20:04

ax3l approved these changes Feb 21, 2025

View reviewed changes

ax3l self-assigned this Feb 21, 2025

ax3l merged commit bfd1f11 into AMReX-Codes:development Feb 21, 2025
75 checks passed

ax3l added performance GPU labels Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Mapping #4326

GPU Mapping #4326

WeiqunZhang commented Feb 7, 2025

ax3l Feb 14, 2025

ax3l left a comment

GPU Mapping #4326

GPU Mapping #4326

Conversation

WeiqunZhang commented Feb 7, 2025

ax3l Feb 14, 2025

Choose a reason for hiding this comment

ax3l left a comment

Choose a reason for hiding this comment