-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/efa: Set g_device_list to NULL after free #10805
Conversation
fi_info has a double free issue. The efa g_device_list isn't being set to NULL after being freed so the next time efa_device_list_finalize is called it will be freed again (double free). Setting it to NULL removes this. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>
@zachdworkin Is it a new issue after b7b9dd6? |
Yes its a new issue after that commit. I rebased before finding this bug. The way I found it was running the fi_info binary with any provider and saw the double free report. Libfabric must be configured so that efa is built (HAVE_EFA 1 in config.log) to reproduce |
@zachdworkin Thanks for catching it ! Let me run with ASAN to reproduce |
@shijin-aws no problem! Seems simple enough. What does ASAN stand for? |
ASAN is address sanitizer that we used to catch illegal memory access and double free error. But I couldn't reproduce the double free report: |
I think I can see where the bug is introduced: Before b7b9dd6, |
steps to reproduce: (check HAVE_EFA 1 in config.log) install/location/bin/fi_info -p verbs if run with -p efa, shm, tcp, etc the error still appears and using gdb to follow it leads to the changes in this patch |
I think this PR is only a bandaid as we have more clean up issue here - The EFA_INI code block needs to move the initialization code to the getinfo. I need to think about whether I need to revert that commit now before i can have a good fix shortly |
@zachdworkin can you try if #10807 fixes the issue for you? |
@shijin-aws #10807 fixes it. Should I close this and let that PR be the one to fix it? |
Yes, you can close this one |
fi_info has a double free issue. The efa g_device_list isn't being set to NULL after being freed so the next time efa_device_list_finalize is called it will be freed again (double free). Setting it to NULL removes this.