-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thrust::strided_iterator
as a thrust::counting_iterator
with step
#4014
base: main
Are you sure you want to change the base?
Conversation
I still need to figure out why the |
|
||
template <typename Incrementable, typename Stride, Stride Value> | ||
inline _CCCL_HOST_DEVICE auto | ||
make_counting_iterator(Incrementable x, ::cuda::std::integral_constant<Stride, Value> stride) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kudos: Making it support integral constant for statically known step sizes is a nice addition!
thrust/testing/counting_iterator.cu
Outdated
auto iter = thrust::make_counting_iterator( | ||
&arr[0].second, ::cuda::std::integral_constant<int, sizeof(std::pair<int, double>)>{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test that it can take a runtime value
struct empty | ||
{}; | ||
|
||
template <typename T> | ||
struct value_holder | ||
{ | ||
T value; | ||
|
||
_CCCL_HOST_DEVICE auto operator()() const | ||
{ | ||
return value; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should be able to avoid this by using the mdspan helper __de_ice
available in <cuda\std\__mdspan\submdspan_helper.h>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly why I have put this draft PR up. I was certain, we would already have a helper somewhere. Thx!
@@ -53,7 +53,7 @@ | |||
THRUST_NAMESPACE_BEGIN | |||
|
|||
// forward declaration of counting_iterator | |||
template <typename Incrementable, typename System, typename Traversal, typename Difference> | |||
template <typename Incrementable, typename System, typename Traversal, typename Difference, typename Step> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should use the empty base class optimization here and inherit from Step
We could also generalize __mdspan_ebco
so that we do not store unneeded information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't EBO get us in trouble with MSVC + nvcc and having different sizes in host/device code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if implemented through inheritance and not [[no_unique_address]]
b382f05
to
8e16c10
Compare
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
thrust::counting_iterator
thrust::strided_iterator
as a thrust::counting_iterator
with step
template <typename Iterator, typename StrideHolder = detail::empty> | ||
using strided_iterator = | ||
transform_iterator<detail::deref, | ||
counting_iterator<Iterator, use_default, random_access_traversal_tag, use_default, StrideHolder>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this definition is easy, I wonder whether we should still define a full-blown iterator without using transform and counting iterator, just to have fewer moving parts for the optimizer to fold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaaand we have to create a custom iterator, because of the proxy reference unwrapping in transform_iterator
:)
f3a6cc6
to
6d1b3ae
Compare
/ok to test |
🟨 CI finished in 1h 53m: Pass: 90%/93 | Total: 2d 20h | Avg: 43m 58s | Max: 1h 27m | Hits: 44%/123409
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
# | Runner |
---|---|
66 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
6d1b3ae
to
0a998b7
Compare
Addresses a part of: #706
Fixes: #3698