Skip to content

Commit

Permalink
rsmi: fix the PCI locality of partitioned devices
Browse files Browse the repository at this point in the history
rsmi_dev_pci_id_get() returns the GPU "partition ID" inside the PCI BDF function,
but this virtual function isn't actually exposed to the OS.
See ROCm/rocm_smi_lib#208 for details.

When hwloc fails to find the corresponding PCI device (usually gets the above bridge instead),
if the BDF function is > 0, get the RSMI partition ID, compare it with the BDF function,
and try to get the PCI device with func = 0 instead.

rsmi_dev_partition_id_get() was only added in ROCm 6.2, so configure-check it.

Thanks to Edgar Leon for reporting and debugging the issue.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit aef721f)
  • Loading branch information
bgoglin committed Jan 10, 2025
1 parent 7c7150f commit 413dbd7
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 5 deletions.
4 changes: 3 additions & 1 deletion NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ Version 2.12.0
- zesDriverGetDeviceByUuidExp() is now required in the L0 runtime.
- ZES/Sysman variants were added in hwloc/levelzero.h to specifically
handle ZES/Sysman device handles.
* Better detect Cray Slingshot NICs, thanks to Edgar Leon for the report.
* Fix the locality of AMD GPU partitions, thanks to Edgar Leon for
reporting and debugging the issue.
* Better detect Cray Slingshot NICs, thanks to Edgar Leon.
* Add support for Die objects and Module groups on Windows.
* Only filter-out Dies that are identical to their Packages
when it applies to all Dies.
Expand Down
5 changes: 3 additions & 2 deletions config/hwloc.m4
Original file line number Diff line number Diff line change
Expand Up @@ -1315,8 +1315,9 @@ return rsmi_init(0);
hwloc_rsmi_warning=no],
[AC_MSG_RESULT([no])
hwloc_rsmi_warning=yes],
[AC_MSG_RESULT([don't know (cross-compiling)])])],
[hwloc_rsmi_happy=no])
[AC_MSG_RESULT([don't know (cross-compiling)])])
AC_CHECK_DECLS([rsmi_dev_partition_id_get],,[:],[[#include <rocm_smi/rocm_smi.h>]])
], [hwloc_rsmi_happy=no])
LDFLAGS="$LDFLAGS_save"
LIBS="$LIBS_save"
], [hwloc_rsmi_happy=no])
Expand Down
36 changes: 36 additions & 0 deletions hwloc/topology-rsmi.c
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,32 @@ static int get_device_pci_info(uint32_t dv_ind, uint64_t *bdfid)
return 0;
}

/*
* Get the partition ID of the GPU
*
* dv_ind (IN) The device index
* partid (OUT) partition ID of GPU devices
*/
static int get_device_partition_id(uint32_t dv_ind, uint32_t *partid)
{
#if HAVE_DECL_RSMI_DEV_PARTITION_ID_GET
rsmi_status_t rsmi_rc = rsmi_dev_partition_id_get(dv_ind, partid);

if (rsmi_rc != RSMI_STATUS_SUCCESS) {
if (HWLOC_SHOW_ALL_ERRORS()) {
const char *status_string;
rsmi_rc = rsmi_status_string(rsmi_rc, &status_string);
fprintf(stderr, "hwloc/rsmi: GPU(%u): Failed to get partition ID: %s\n", (unsigned)dv_ind, status_string);
}
return -1;
}
return 0;
#else
errno = ENOSYS;
return -1;
#endif
}

/*
* Get the PCI link speed of the GPU
*
Expand Down Expand Up @@ -366,6 +392,16 @@ hwloc_rsmi_discover(struct hwloc_backend *backend, struct hwloc_disc_status *dst
device = ((bdfid & 0xff)>>3) & 0x1f;
func = bdfid & 0x7;
parent = hwloc_pci_find_parent_by_busid(topology, domain, bus, device, func);
if ((!parent || parent->type != HWLOC_OBJ_PCI_DEVICE) && func > 0) {
/* Partitioned MI devices may return the partition ID in a fake BDF func,
* hence we would fail to find a pcidev parent.
* Try with func=0 instead.
*/
uint32_t partid = 0;
get_device_partition_id(i, &partid);
if (func == partid)
parent = hwloc_pci_find_parent_by_busid(topology, domain, bus, device, 0);
}
if (parent && parent->type == HWLOC_OBJ_PCI_DEVICE)
get_device_pci_linkspeed(i, &parent->attr->pcidev.linkspeed);
if (!parent)
Expand Down
3 changes: 2 additions & 1 deletion tests/hwloc/ports/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,8 @@ nodist_libhwloc_port_rsmi_la_SOURCES = topology-rsmi.c
libhwloc_port_rsmi_la_SOURCES = \
include/rsmi/rocm_smi/rocm_smi.h
libhwloc_port_rsmi_la_CPPFLAGS = $(common_CPPFLAGS) \
-I$(HWLOC_top_srcdir)/tests/hwloc/ports/include/rsmi
-I$(HWLOC_top_srcdir)/tests/hwloc/ports/include/rsmi \
-DHAVE_DECL_RSMI_DEV_PARTITION_ID_GET=1

nodist_libhwloc_port_levelzero_la_SOURCES = topology-levelzero.c
libhwloc_port_levelzero_la_SOURCES = \
Expand Down
3 changes: 2 additions & 1 deletion tests/hwloc/ports/include/rsmi/rocm_smi/rocm_smi.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright © 2013-2021 Inria. All rights reserved.
* Copyright © 2013-2024 Inria. All rights reserved.
* Copyright (c) 2020, Advanced Micro Devices, Inc. All rights reserved.
* Written by Advanced Micro Devices,
* See COPYING in top-level directory.
Expand Down Expand Up @@ -79,6 +79,7 @@ rsmi_status_t rsmi_dev_name_get(uint32_t dv_ind, char *name, size_t len);
rsmi_status_t rsmi_dev_serial_number_get(uint32_t dv_ind, char *serial_num, uint32_t len);
rsmi_status_t rsmi_dev_unique_id_get(uint32_t dv_ind, uint64_t *id);
rsmi_status_t rsmi_dev_pci_bandwidth_get(uint32_t dv_ind, rsmi_pcie_bandwidth_t *bandwidth);
rsmi_status_t rsmi_dev_partition_id_get(uint32_t dv_ind, uint32_t *partition_id);
rsmi_status_t rsmi_dev_xgmi_hive_id_get(uint32_t dv_ind, uint64_t *hive_id);
rsmi_status_t rsmi_topo_get_link_type(uint32_t dv_ind_src, uint32_t dv_ind_dst,
uint64_t *hops, RSMI_IO_LINK_TYPE *type);
Expand Down

0 comments on commit 413dbd7

Please sign in to comment.