-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legion_prof: assertion failed: copy_inst_info._src.unwrap() == chan_src
#1606
Comments
Usually a CopyInfo can contains multiple CopyInstInfos, which have the same src and dst memories. This error tells some of the CopyInstInfos do not have the same src/dst memories. It is usually a Legion bug.
The src of the last one is not consistence with the first two. |
What are the other parameters for each of the |
Sure, they are
|
This isn't a bug in Legion; it is an issue with the profiler. Look at the |
The hierarchy of the profiler is:
but now a |
I don't think that is always true. If the src and dst instances are on different nodes and a multi-hop copy is required, then Realm can do the copies from each of the source instances into one intermediate buffer, and then do a single RDMA operation from the intermediate buffer to the destination instance. The Realm API encourages clients to express copies fused together in this way so that it can take advantage of optimizations like that, and Legion is a good client that tries to give Realm as much information in as few API calls as possible. Realm might end up splitting some of these copies apart, but some of them might also get fused.
How is it different than what we do with the gather copies? In some sense this is a kind of gather copy, but just gathering different fields from different source instances into the same destination instance. Another possibility is just to put this copy on both channels, and only show the respective fields associated with each channel. |
For gather copies, we will at least have one
I am not sure if having one copy in multiple channels will break the current design of legion prof. If it does not work, I am thinking about cloning the |
I personally prefer the latter approach. |
If "the latter approach" means splitting the copy into one part per channel, I agree. |
@mariodirenzo Can you please try this commit https://gitlab.com/StanfordLegion/legion/-/commit/592efe8d0230b25a76c1139e9094f00a94fdf2b6? This is a fix for the legion_prof.py, if it works for you, I will apply the same fix for the rust code. |
The patch works for me. Thanks for working on it |
@mariodirenzo We made a few more fixes and then merged it into |
Both python and rust versions of the profiler work now. Thanks for fixing this issue! |
I'm trying to visualize the timeline of the two attached logs produced with the commit
cbbc900287e1c20a5c1e11e82bea0e27cdaf577f
of Legion. However, I keep getting the assertionDo you have any advice on what might be wrong?
@elliottslaughter, can you please add this to #1032?
prof_0.log
prof_1.log
The text was updated successfully, but these errors were encountered: