Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: More fixes for efa-direct #10811

Merged
merged 2 commits into from
Feb 21, 2025
Merged

Conversation

shijin-aws
Copy link
Contributor

@shijin-aws shijin-aws commented Feb 21, 2025

A series of fixes for efa-direct:

Tested by #10800 on p5

commit f3e26d6 introduces another bug
that the max_msg_size can be larger than
the the max_msg_size of efa_prov_info when
FI_RMA is requested: This will cause the
fi_endpoint() call failed when comparing
the user info and prov info.

This patch fixes this issue by making the
prov_info always use the largest max_msg_size,
and conditionally reduce the returned
user_info to the smaller max_msg_size
when FI_RMA is not requested.

Signed-off-by: Shi Jin <sjina@amazon.com>
@shijin-aws shijin-aws requested a review from a team February 21, 2025 01:23
@shijin-aws shijin-aws changed the title prov/efa: Second fix for efa-direct's max_msg_size prov/efa: More fixes for efa-direct Feb 21, 2025
1. efa_ep should support selective_completion after
the FI_CONTEXT2 implementation
2. split efa_cntr_open as efa_cntr_open and
efa_rdm_cntr_open. efa_rdm_cntr_open is only
used for rdm info type. Both of them setup
the correct progress function accordingly.
We need separate open functions because
there is no way to distinguish efa-direct
and efa-rdm during the cntr_open call
as there is no user_info object

Signed-off-by: Shi Jin <sjina@amazon.com>
@@ -227,12 +227,6 @@ static int efa_ep_bind(struct fid *fid, struct fid *bfid, uint64_t flags)

switch (bfid->fclass) {
case FI_CLASS_CQ:
if (flags & FI_SELECTIVE_COMPLETION) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we have this in the first place? Because dgram doesn't support FI_SELECTIVE_COMPLETION?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the efa_ep PR was merged before the FI_CONTEXT2 change, so I didn't remove it.

@shijin-aws shijin-aws merged commit cefee50 into ofiwg:main Feb 21, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants