From ea2a5b9a954e8ab3dedd9868469d32339fbec394 Mon Sep 17 00:00:00 2001 From: OFIWG Bot Date: Thu, 6 Mar 2025 17:01:16 +0000 Subject: [PATCH] Update GH man pages Signed-off-by: OFIWG Bot --- main/man/fi_domain.3.md | 48 ++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/main/man/fi_domain.3.md b/main/man/fi_domain.3.md index eb37ce6d187..79ec4b3779f 100644 --- a/main/man/fi_domain.3.md +++ b/main/man/fi_domain.3.md @@ -413,17 +413,18 @@ the endpoint is reliable or unreliable, as well as provider and protocol specific implementation details, as shown in the following table. The table assumes that all peers enable or disable RM the same. -| Resource | DGRAM EP-no RM | DGRAM EP-with RM | RDM/MSG EP-no RM | RDM/MSG EP-with RM | -|:--------:|:-------------------:|:-------------------:|:------------------:|:-----------------:| -| Tx Ctx | undefined error | EAGAIN | undefined error | EAGAIN | -| Rx Ctx | undefined error | EAGAIN | undefined error | EAGAIN | -| Tx CQ | undefined error | EAGAIN | undefined error | EAGAIN | -| Rx CQ | undefined error | EAGAIN | undefined error | EAGAIN | -| Target EP | dropped | dropped | transmit error | retried | -| No Rx Buffer | dropped | dropped | transmit error | retried | -| Rx Buf Overrun | truncate or drop | truncate or drop | truncate or error | truncate or error | -| Unmatched RMA | not applicable | not applicable | transmit error | transmit error | -| RMA Overrun | not applicable | not applicable | transmit error | transmit error | +| Resource | DGRAM EP-no RM | DGRAM EP-with RM | MSG EP-no RM | MSG EP-with RM | RDM EP-no RM | RDM EP-with RM | +|:--------:|:-------------------:|:-------------------:|:------------------:|:-----------------:| :------------------:|:-----------------:| +| Tx Ctx | undefined error | EAGAIN | undefined error | EAGAIN | undefined error | EAGAIN | +| Rx Ctx | undefined error | EAGAIN | undefined error | EAGAIN | undefined error | EAGAIN | +| Tx CQ | undefined error | EAGAIN | undefined error | EAGAIN | undefined error | EAGAIN | +| Rx CQ | undefined error | EAGAIN | undefined error | EAGAIN | undefined error | EAGAIN | +| Target EP | dropped | dropped | transmit error | retried | transmit error | retried | +| No Rx Buffer | dropped | dropped | transmit error | retried | transmit error | retried | +| Rx Buf Overrun | truncate or drop | truncate or drop | truncate or error | truncate or error | truncate or error | truncate or error | +| Unmatched RMA | not applicable | not applicable | transmit error | transmit error | transmit error | transmit error | +| RMA Overrun | not applicable | not applicable | transmit error | transmit error | transmit error | transmit error | +| Unreachable EP | dropped | dropped | not applicable | not applicable | transmit error | transmit error | The resource column indicates the resource being accessed by a data transfer operation. @@ -482,12 +483,25 @@ transfer operation. operations, or attempt to access outside of the target memory region will fail, resulting in a transmit error. -When a resource management error occurs on an endpoint, the endpoint is -transitioned into a disabled state. Any operations which have not -already completed will fail and be discarded. For connectionless endpoints, -the endpoint must be re-enabled before it will accept new data transfer -operations. For connected endpoints, the connection is torn down and -must be re-established. +*Unreachable EP* +: Unreachable endpoint is a connectionless specific scenario where transmit + operations are issued to unreachable target endpoints. Such scenarios include + no-route-to-host or down target NIC. For FI_EP_DGRAM endpoints, transmit + operations targeting an unreachable endpoint will have operation dropped. For + FI_EP_RDM, target operations targeting an unreachable endpoint will result in + a transmit error. + +When a resource management error occurs on an a connected endpoint, the endpoint +will transition into a disabled state and the connection torn down. A disabled +endpoint will drop any queued or inflight operations. + +The behavior of resource management errors on connectionless endpoints depends +on the type of error. If RM is disabled and one of the following errors occur, +the endpoint will be disabled: Tx Ctx, Rx Ctx, Tx CQ, or Rx CQ. For other errors +(Target EP, No Rx Buffer, etc.), the operation may fail, but the endpoint will +remain enabled. A disabled endpoint will drop or fail any queued or inflight +operations. In addition, a disabled endpoint must be re-enabled before it will +accept new data transfer operations. There is one notable restriction on the protections offered by resource management. This occurs when resource management is enabled on an