-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#2201: implement memory aware TemperedLB in vt #2203
Conversation
Pipelines resultsPR tests (gcc-12, ubuntu, mpich) Build for 10c35df (2024-01-25 22:56:08 UTC)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
3cd9309
to
c13aa2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do a full review, just wanted to make some comments.
f2332af
to
441d27b
Compare
@lifflander @ppebay I am unable to run this in a production environment:
|
I think that the way I've implemented this with a recursive handler is causing tracing issues. I will look into it.
|
Co-authored-by: Jonathan Lifflander <jliffla@sandia.gov>
Co-authored-by: Jonathan Lifflander <jliffla@sandia.gov>
…ommunicating to it
…proximate criterion
…ock while swapping
6130120
to
10c35df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've flagged some stuff that needs to be fixed (mostly typos) on the rebased branch (not this one!)
tasks for the other shared_ids across all ranks do not need to be split across | ||
multiple ranks to perfectly balance the load (time). | ||
|
||
Below is one solution with a perfectly balanced load and decent communication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the word perfectly
and replace it with something weaker as long as the solution presented below is in fact well balanced. If the solution below is not well balanced, delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed perfectly
but I haven't checked the given solution
} | ||
|
||
bool TemperedLB::memoryTransferCriterion(double try_total_bytes, double src_bytes) { | ||
// FIXME: incomplete implementation that ignores memory regrouping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This FIXME isn't needed for swapping whole clusters, just for subclustering, right?
I fixed the above typos in the rebased branch already. |
Do you think we should fully implement the sub-clustering with the full work model before we merge this? |
Closing because this is superseded by #2278 |
Resolves #2201
This PR in particular:
Original
strategy;SwapClusters
transfer strategy; i.e. a simplification of LBAF'sClusteringStrategy
(without sub-clustering).