Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: Remove efa_av->ep_type in favor of efa_domain->info_type #10827

Merged
merged 3 commits into from
Mar 3, 2025

Conversation

sunkuamzn
Copy link
Contributor

@sunkuamzn sunkuamzn commented Feb 26, 2025

prov/efa: Do not allocate rdm_peer struct for efa-direct and dgram paths

prov/efa: Remove efa_av->ep_type in favor of efa_domain->info_type

efa_av->ep_type was used to distinguish between the RDM path and the
DGRAM path. The efa-direct path has FI_EP_RDM ep type but it does not
require a efa_conn->rdm_peer. But the efa-direct path still uses unique connid
based on timestamp.

Use efa_domain->info_type to distinguish the three code paths.

prov/efa: Replace domain->rdm_ep with domain->info_type

domain->rdm_ep can't distinguish between efa-direct and dgram code paths

@@ -233,8 +233,7 @@ int main(int argc, char **argv)
if (optind < argc)
opts.dst_addr = argv[optind];

hints->caps = hints->ep_attr->type == FI_EP_RDM ?
FI_TAGGED : FI_MSG;
hints->caps = FI_MSG;
Copy link
Contributor

@shijin-aws shijin-aws Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a separate PR or commit, at least we shouldn't mention efa-direct as the motivation of this common change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the test doesn't require tag matching, it's better to allow FI_MSG and FI_TAGGED toggling via -o

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the fabtests commit to #10828

@sunkuamzn sunkuamzn force-pushed the rdm-av-ep-type branch 2 times, most recently from c57482e to 3665a63 Compare February 27, 2025 02:11
*
* @param[in] state struct efa_resource that is managed by the framework
*/
void test_av_ep_type_efa_rdm(struct efa_resource **state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should belong to your next commit?

@@ -272,8 +272,13 @@ int efa_conn_rdm_init(struct efa_av *av, struct efa_conn *conn, bool insert_shm_
assert(!dlist_empty(&av->util_av.ep_list));
efa_rdm_ep = container_of(av->util_av.ep_list.next, struct efa_rdm_ep, base_ep.util_ep.av_entry);

peer = &conn->rdm_peer;
efa_rdm_peer_construct(peer, efa_rdm_ep, conn);
peer = (struct efa_rdm_peer *)calloc(1, sizeof(struct efa_rdm_peer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peer is a big object, should we use bufferpool to optimize the allocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are timing out now, so bufpool is not optional

.flags = OFI_BUFPOOL_NO_TRACK,
};

ret = ofi_bufpool_create_attr(&rdm_peer_pool_attr, &av->rdm_peer_pool);
Copy link
Contributor

@shijin-aws shijin-aws Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is only created when we have shm? Should it be moved to L916?

domain->rdm_ep can't distinguish between efa-direct and dgram code paths

Signed-off-by: Sai Sunku <sunkusa@amazon.com>
efa_av->ep_type was used to distinguish between the RDM path and the
DGRAM path. The efa-direct path has FI_EP_RDM ep type but it does not
require a efa_conn->rdm_peer. But the efa-direct path still uses unique
connid based on timestamp.

Use efa_domain->info_type to distinguish the three code paths.

Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Use a separate bufpool to allocate the peer struct in the rdm path

Signed-off-by: Sai Sunku <sunkusa@amazon.com>
@sunkuamzn
Copy link
Contributor Author

bot:aws:retest

@sunkuamzn sunkuamzn merged commit fddf018 into ofiwg:main Mar 3, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants