We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is HTR running on 16 nodes on Lassen. We get the following assertion:
prometeo.exec: /g/g92/dodipass/legion/runtime/realm/barrier_impl.cc:465: void Realm::broadcast_trigger(Barrier, const std::vector<RemoteNotification>&, const std::vector<int>&, EventImpl::gen_t, EventImpl::gen_t, EventImpl::gen_t, NodeID, unsigned int, ReductionOpID, const void*, size_t, bool): Assertion ((long long)max_recommended_payload - (long long)reduce_data_size - (long long)sizeof(BarrierTriggerMessageArgsInternal) - (long long)sizeof(size_t)) > 0' failed.
This is the backtrace:
Signal 6 received by node 0, process 3304 (thread 20005ecbf8b0) - obtaining backtrace Signal 6 received by process 3304 (thread 20005ecbf8b0) at: stack trace: 12 frames [0] = [0x2000000504d8] [1] = /lib64/libc.so.6(abort+0x2b4) [0x20000d282134] [2] = /lib64/libc.so.6(+0x357d4) [0x20000d2757d4] [3] = /lib64/libc.so.6(__assert_fail+0x64) [0x20000d2758c4] [4] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x13023400] [5] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x130260d4] [6] = /g/g92/dodipass/HTRpp/bin/prometeo.exec(Realm::IncomingMessageManager::do_work(Realm::TimeLimit)+0x134) [0x131f9694] [7] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x12ffe34c] [8] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x12ffee60] [9] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x13131cd8] [10] = /lib64/libpthread.so.0(+0x8cd4) [0x200000128cd4] [11] = /lib64/libc.so.6(clone+0xe4) [0x20000d367f14]
The text was updated successfully, but these errors were encountered:
@artempriakhin is the one who touched the barrier code most recently.
Sorry, something went wrong.
Yes, this one should be on me. I will work on a fix Monday first thing.
This is in progress. The patch will be out soon
apryakhin
No branches or pull requests
This is HTR running on 16 nodes on Lassen. We get the following assertion:
prometeo.exec: /g/g92/dodipass/legion/runtime/realm/barrier_impl.cc:465: void Realm::broadcast_trigger(Barrier, const std::vector<RemoteNotification>&, const std::vector<int>&, EventImpl::gen_t, EventImpl::gen_t, EventImpl::gen_t, NodeID, unsigned int, ReductionOpID, const void*, size_t, bool): Assertion ((long long)max_recommended_payload - (long long)reduce_data_size - (long long)sizeof(BarrierTriggerMessageArgsInternal) - (long long)sizeof(size_t)) > 0' failed.
This is the backtrace:
Signal 6 received by node 0, process 3304 (thread 20005ecbf8b0) - obtaining backtrace Signal 6 received by process 3304 (thread 20005ecbf8b0) at: stack trace: 12 frames [0] = [0x2000000504d8] [1] = /lib64/libc.so.6(abort+0x2b4) [0x20000d282134] [2] = /lib64/libc.so.6(+0x357d4) [0x20000d2757d4] [3] = /lib64/libc.so.6(__assert_fail+0x64) [0x20000d2758c4] [4] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x13023400] [5] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x130260d4] [6] = /g/g92/dodipass/HTRpp/bin/prometeo.exec(Realm::IncomingMessageManager::do_work(Realm::TimeLimit)+0x134) [0x131f9694] [7] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x12ffe34c] [8] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x12ffee60] [9] = /g/g92/dodipass/HTRpp/bin/prometeo.exec() [0x13131cd8] [10] = /lib64/libpthread.so.0(+0x8cd4) [0x200000128cd4] [11] = /lib64/libc.so.6(clone+0xe4) [0x20000d367f14]
The text was updated successfully, but these errors were encountered: