You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I vaguely remember that not every reduce or scan algorithm uses ::cuda::std::__accumulator_t to determine the accumulator type to use. We should consolidate this behavior.
The text was updated successfully, but these errors were encountered:
I had another look and realized that the C++ standard seems to determine the accumulator type to either be the iterator value type or the initial value type. So it seems the divergence between CUB and Thrust is fine.
This is also related and relevant to SIMD reduction. Using cuda::std::plus<> vs. cuda::std::plus<T> could affect performance. e.g. cuda::std::plus<> applied to int16_t induces implicit promotion which disables SIMD
I vaguely remember that not every reduce or scan algorithm uses
::cuda::std::__accumulator_t
to determine the accumulator type to use. We should consolidate this behavior.The text was updated successfully, but these errors were encountered: