-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: PCI BDF returned by rsmi_dev_pci_id_get() wrong on partitioned MI300A? #208
Comments
Hi @bgoglin. Internal ticket has been created to investigate your issue. Thanks! |
Hi @bgoglin, just to update you - I've been able to reproduce this issue on a local MI300A system. Thanks for the detailed description, and I'll follow up once I know more. |
Hi @bgoglin, here are my findings:
While rocm-smi shows the partition ID as embedded in the bottom of the function bits, these devices do not show up in lspci. The pre-partitioned device is the only one with an "accurate" BDF reported by rocm-smi. This is intentional: Function ID manipulation only happens inside the ROCm stack, and this partitioning is not reflected to the rest of the OS (we are not, as you say, hotplugging PCIe devices). We have alternative methods to address individual partitions, like the deviceID API. Do you have a specific usecase where this difference between BDFs reported by rocm-smi and visible to the rocm stack, versus the BDFs visible to the OS, causes an issue?
|
The usecase is @eleon from LLNL on the El Capitan supercomputer using hwloc in https://github.com/LLNL/mpibind. hwloc gets each partition from the ROCm SMI lib but it then fails to place that those GPU partitions in the (PCI) topology because the reported BDF doesn't exist. I understand your explanation above, but it seems to contradict what the documentation of rsmi_dev_pci_id_get() says? I'd like a clarification of the doc to handle this case. For instance, may I assume that everytime ROCm SMI reports a PCI function F > 0, it means it's actually partition #F of the PCI device with function = 0 ? |
@bgoglin I agree that should be documented better.
Yes, for MI-series devices, the pcie function bits are only used to reflect the partition ID. I'll start on a documentation change to clarify this, thanks for bringing it up. |
rsmi_dev_pci_id_get() returns the GPU "partition ID" inside the PCI BDF function, but this virtual function isn't actually exposed to the OS. See ROCm/rocm_smi_lib#208 for details. When hwloc fails to find the corresponding PCI device, if the BDF function is > 0, get the RSMI partition ID, compare it with the BDF function, and try to get the PCI device with func = 0 instead. Thanks to Edgar Leon for reporting the issue. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
rsmi_dev_pci_id_get() returns the GPU "partition ID" inside the PCI BDF function, but this virtual function isn't actually exposed to the OS. See ROCm/rocm_smi_lib#208 for details. When hwloc fails to find the corresponding PCI device, if the BDF function is > 0, get the RSMI partition ID, compare it with the BDF function, and try to get the PCI device with func = 0 instead. Thanks to Edgar Leon for reporting the issue. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
rsmi_dev_pci_id_get() returns the GPU "partition ID" inside the PCI BDF function, but this virtual function isn't actually exposed to the OS. See ROCm/rocm_smi_lib#208 for details. When hwloc fails to find the corresponding PCI device (usually gets the above bridge instead), if the BDF function is > 0, get the RSMI partition ID, compare it with the BDF function, and try to get the PCI device with func = 0 instead. rsmi_dev_partition_id_get() was only added in ROCm 6.2, so configure-check it. Thanks to Edgar Leon for reporting and debugging the issue. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
rsmi_dev_pci_id_get() returns the GPU "partition ID" inside the PCI BDF function, but this virtual function isn't actually exposed to the OS. See ROCm/rocm_smi_lib#208 for details. When hwloc fails to find the corresponding PCI device (usually gets the above bridge instead), if the BDF function is > 0, get the RSMI partition ID, compare it with the BDF function, and try to get the PCI device with func = 0 instead. rsmi_dev_partition_id_get() was only added in ROCm 6.2, so configure-check it. Thanks to Edgar Leon for reporting and debugging the issue. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> (cherry picked from commit aef721f)
I assume the documentation change is 67a0de4 |
- To address #208 where use of fake BDFs for partitions can cause confusion. This note is already in the comments of the function definition, but was not updated in the function declaration. - Fix broken formatting for the location table for PCIE coordinate fields - Tracked in SWDEV-501108 Change-Id: Ic85439866cb836bb43acc52314a7f1d026c3215d
Problem Description
Hello
I am debugging a hwloc issue with users from the El Capitan supercomputer (MI300A). I don't have access to the hardware, hence it's a bit complicated for me to get all details.
It looks like the PCI BDF returned by rsmi_dev_pci_id_get() is wrong when called on the non-first partition of a partitioned MI300A. The root GPU BDF is something like 0001:02:00.0.
rocm-smi --showbus
reports BDF like 0001:02:00.1 and 0001:02.00.2 for 2nd and 3rd partitions. I first through that you were hotplugging additional PCI functions when partitioning a GPU, but that doesn't seem to be the case. According to my contact, these PCI BDFs do not actually exist in the system (lspci) even after enabling partitioning.I looked at the documentation of rsmi_dev_pci_id_get(), it says that the partition IDs is actually encoded in the 64bit returned valued between the bus and domain, not inside the PCI function bits.
Is this a documentation bug? Or an implementation bug?
Operating System
Linux
CPU
MI300A
GPU
MI300A
ROCm Version
ROCm 6.1.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: