Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and Replace cub::BFE #4031

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Mar 6, 2025

Fixes #4025

Description

Replace/deprecate cub::BFE with new <cuda/bit> functionalities nvidia.github.io/cccl/libcudacxx/extended_api/bit.html

The initial PR found the following problems:

  • MSVC triggers unused var warning in bitfield.h
  • catch2_test_device_segmented_radix_sort_keys.cu includes tests with 0 bit width
  • catch2_test_block_radix_sort.cu includes tests with 0 bit width

Performance comparison PTX BFE vs. cuda::bitfield_extract with SM80. TLDR: slightly faster

[0] NVIDIA RTX A6000

T{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^28 1 3.667 ms 0.91% 3.646 ms 0.85% -21.558 us -0.59% SAME
I8 I32 2^28 0.544 3.589 ms 1.07% 3.573 ms 0.96% -15.195 us -0.42% SAME
I8 I32 2^28 0.201 3.572 ms 0.43% 3.566 ms 0.73% -5.587 us -0.16% SAME
I8 I64 2^28 1 3.891 ms 0.60% 3.887 ms 0.57% -4.424 us -0.11% SAME
I8 I64 2^28 0.544 3.816 ms 0.55% 3.812 ms 0.63% -3.316 us -0.09% SAME
I8 I64 2^28 0.201 3.824 ms 0.29% 3.811 ms 0.28% -13.288 us -0.35% FAST
I16 I32 2^28 1 8.461 ms 0.45% 8.441 ms 0.51% -20.436 us -0.24% SAME
I16 I32 2^28 0.544 8.349 ms 0.37% 8.304 ms 0.33% -45.869 us -0.55% FAST
I16 I32 2^28 0.201 8.239 ms 0.37% 8.173 ms 0.36% -66.614 us -0.81% FAST
I16 I64 2^28 1 8.561 ms 0.53% 8.504 ms 0.47% -57.116 us -0.67% FAST
I16 I64 2^28 0.544 8.410 ms 0.37% 8.335 ms 0.59% -75.033 us -0.89% FAST
I16 I64 2^28 0.201 8.271 ms 0.37% 8.184 ms 0.38% -86.897 us -1.05% FAST
I32 I32 2^28 1 14.839 ms 0.40% 14.847 ms 0.39% 8.064 us 0.05% SAME
I32 I32 2^28 0.544 14.883 ms 0.53% 14.896 ms 0.53% 13.054 us 0.09% SAME
I32 I32 2^28 0.201 14.871 ms 0.51% 14.878 ms 0.51% 6.633 us 0.04% SAME
I32 I64 2^28 1 14.848 ms 0.58% 14.855 ms 0.63% 6.919 us 0.05% SAME
I32 I64 2^28 0.544 14.906 ms 0.60% 14.905 ms 0.55% -1.031 us -0.01% SAME
I32 I64 2^28 0.201 14.900 ms 0.45% 14.912 ms 0.52% 11.272 us 0.08% SAME
I64 I32 2^28 1 56.083 ms 0.16% 56.086 ms 0.17% 2.822 us 0.01% SAME
I64 I32 2^28 0.544 55.997 ms 0.49% 55.995 ms 0.50% -1.792 us -0.00% SAME
I64 I32 2^28 0.201 55.845 ms 0.36% 55.846 ms 0.36% 0.610 us 0.00% SAME
I64 I64 2^28 1 56.104 ms 0.51% 56.102 ms 0.51% -1.453 us -0.00% SAME
I64 I64 2^28 0.544 56.008 ms 0.50% 56.008 ms 0.50% 0.083 us 0.00% SAME
I64 I64 2^28 0.201 55.854 ms 0.36% 55.851 ms 0.36% -2.967 us -0.01% SAME
I128 I32 2^28 1 217.763 ms 0.03% 217.760 ms 0.02% -3.585 us -0.00% SAME
I128 I32 2^28 0.544 217.483 ms 0.29% 217.490 ms 0.29% 7.103 us 0.00% SAME
I128 I32 2^28 0.201 217.180 ms 0.03% 217.195 ms 0.03% 15.138 us 0.01% SAME
I128 I64 2^28 1 217.926 ms 0.23% 217.915 ms 0.23% -11.061 us -0.01% SAME
I128 I64 2^28 0.544 217.440 ms 0.29% 217.449 ms 0.29% 9.034 us 0.00% SAME
I128 I64 2^28 0.201 217.128 ms 0.03% 217.127 ms 0.03% -1.376 us -0.00% SAME
F32 I32 2^28 1 14.925 ms 1.21% 14.930 ms 1.28% 4.824 us 0.03% SAME
F32 I32 2^28 0.544 14.933 ms 0.57% 14.930 ms 0.55% -2.420 us -0.02% SAME
F32 I32 2^28 0.201 14.952 ms 0.48% 14.953 ms 0.51% 0.608 us 0.00% SAME
F32 I64 2^28 1 14.837 ms 0.67% 14.844 ms 0.78% 7.563 us 0.05% SAME
F32 I64 2^28 0.544 14.917 ms 0.68% 14.914 ms 0.58% -2.788 us -0.02% SAME
F32 I64 2^28 0.201 14.966 ms 0.50% 14.967 ms 0.46% 0.405 us 0.00% SAME
F64 I32 2^28 1 56.097 ms 0.17% 56.100 ms 0.17% 2.593 us 0.00% SAME
F64 I32 2^28 0.544 56.032 ms 0.50% 56.034 ms 0.49% 1.903 us 0.00% SAME
F64 I32 2^28 0.201 55.862 ms 0.35% 55.860 ms 0.35% -1.766 us -0.00% SAME
F64 I64 2^28 1 56.130 ms 0.50% 56.132 ms 0.51% 2.047 us 0.00% SAME
F64 I64 2^28 0.544 56.035 ms 0.50% 56.033 ms 0.49% -2.434 us -0.00% SAME
F64 I64 2^28 0.201 55.869 ms 0.35% 55.871 ms 0.35% 1.195 us 0.00% SAME

Summary

  • Total Matches: 42
    • Pass (diff <= min_noise): 36
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 6

@fbusato fbusato added the 3.0 Targeted for 3.0 release label Mar 6, 2025
@fbusato fbusato requested a review from bernhardmgruber March 6, 2025 01:22
@fbusato fbusato self-assigned this Mar 6, 2025
@fbusato fbusato requested a review from a team as a code owner March 6, 2025 01:22
Copy link
Contributor

github-actions bot commented Mar 6, 2025

🟨 CI finished in 1h 41m: Pass: 84%/93 | Total: 2d 21h | Avg: 44m 42s | Max: 1h 25m | Hits: 60%/115695
  • 🟨 cub: Pass: 75%/45 | Total: 1d 21h | Avg: 1h 00m | Max: 1h 25m | Hits: 25%/40744

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  74%/43  | Total:  1d 18h | Avg: 59m 49s | Max:  1h 25m | Hits:  25%/38308 
      🟩 arm64              Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 10m | Hits:  26%/2436  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  26%/2104  
      🔍 nvcc               Pass:  74%/43  | Total:  1d 19h | Avg:  1h 00m | Max:  1h 25m | Hits:  25%/38640 
    🔍 sm: 90 🔍
      🔍 90                 Pass:  33%/3   | Total:  1h 07m | Avg: 22m 23s | Max: 28m 55s | Hits:  26%/1218  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 17m | Avg:  1h 17m | Max:  1h 17m | Hits:  26%/1218  
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total:  5h 48m | Avg:  1h 09m | Max:  1h 15m | Hits:  26%/4880  
      🟩 12.5               Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 20m | Hits:  17%/2254  
      🟨 12.8               Pass:  73%/38  | Total:  1d 12h | Avg: 58m 03s | Max:  1h 25m | Hits:  26%/33610 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 07m | Hits:  26%/2104  
      🟨 nvcc12.0           Pass:  80%/5   | Total:  5h 48m | Avg:  1h 09m | Max:  1h 15m | Hits:  26%/4880  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 20m | Hits:  17%/2254  
      🟨 nvcc12.8           Pass:  72%/36  | Total:  1d 10h | Avg: 57m 41s | Max:  1h 25m | Hits:  26%/31506 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 39m | Avg:  1h 09m | Max:  1h 13m | Hits:  27%/4880  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 07m | Hits:  27%/2436  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 13m | Hits:  27%/2436  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m | Hits:  27%/2436  
      🟨 Clang18            Pass:  71%/7   | Total:  6h 02m | Avg: 51m 44s | Max:  1h 08m | Hits:  26%/5758  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  26%/2440  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  26%/1220  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 08m | Hits:  26%/2440  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m | Hits:  26%/2440  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits:  26%/2436  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 12m | Hits:  26%/2436  
      🟨 GCC13              Pass:  45%/11  | Total:  7h 09m | Avg: 39m 00s | Max:  1h 17m | Hits:  26%/6090  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  2h 35m | Avg:  1h 17m | Max:  1h 19m
      🟨 MSVC14.42          Pass:  50%/2   | Total:  2h 50m | Avg:  1h 25m | Max:  1h 25m | Hits:  12%/1042  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 20m | Hits:  17%/2254  
    🟨 cxx_family
      🟨 Clang              Pass:  88%/17  | Total: 17h 29m | Avg:  1h 01m | Max:  1h 13m | Hits:  26%/17946 
      🟨 GCC                Pass:  72%/22  | Total: 19h 38m | Avg: 53m 35s | Max:  1h 17m | Hits:  26%/19502 
      🟨 MSVC               Pass:  25%/4   | Total:  5h 26m | Avg:  1h 21m | Max:  1h 25m | Hits:  12%/1042  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 34m | Avg:  1h 17m | Max:  1h 20m | Hits:  17%/2254  
    🟨 gpu
      🟨 h100               Pass:  33%/3   | Total:  1h 07m | Avg: 22m 23s | Max: 28m 55s | Hits:  26%/1218  
      🟨 rtx2080            Pass:  91%/34  | Total:  1d 15h | Avg:  1h 10m | Max:  1h 25m | Hits:  25%/37090 
      🟨 rtxa6000           Pass:  25%/8   | Total:  4h 06m | Avg: 30m 47s | Max:  1h 08m | Hits:  26%/2436  
    🟨 jobs
      🟨 Build              Pass:  91%/37  | Total:  1d 18h | Avg:  1h 09m | Max:  1h 25m | Hits:  25%/40744 
      🟥 DeviceLaunch       Pass:   0%/1   | Total: 22m 15s | Avg: 22m 15s | Max: 22m 15s
      🟥 GraphCapture       Pass:   0%/1   | Total: 17m 31s | Avg: 17m 31s | Max: 17m 31s
      🟥 HostLaunch         Pass:   0%/3   | Total:  1h 10m | Avg: 23m 21s | Max: 23m 58s
      🟥 TestGPU            Pass:   0%/3   | Total: 43m 14s | Avg: 14m 24s | Max: 15m 27s
    🟨 std
      🟨 17                 Pass:  85%/20  | Total: 23h 19m | Avg:  1h 09m | Max:  1h 25m | Hits:  26%/20465 
      🟨 20                 Pass:  68%/25  | Total: 21h 49m | Avg: 52m 23s | Max:  1h 25m | Hits:  25%/20279 
    
  • 🟨 thrust: Pass: 93%/45 | Total: 22h 48m | Avg: 30m 24s | Max: 1h 01m | Hits: 79%/74643

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  93%/43  | Total: 21h 54m | Avg: 30m 34s | Max:  1h 01m | Hits:  79%/71088 
      🟩 arm64              Pass: 100%/2   | Total: 53m 40s | Avg: 26m 50s | Max: 28m 45s | Hits:  77%/3555  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 45s | Avg: 23m 52s | Max: 24m 04s | Hits:  77%/3554  
      🔍 nvcc               Pass:  93%/43  | Total: 22h 00m | Avg: 30m 42s | Max:  1h 01m | Hits:  79%/71089 
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/17  | Total:  7h 31m | Avg: 26m 31s | Max: 31m 14s | Hits:  79%/30209 
      🟩 GCC                Pass: 100%/21  | Total:  9h 11m | Avg: 26m 16s | Max: 34m 15s | Hits:  81%/37338 
      🔍 MSVC               Pass:  40%/5   | Total:  4h 24m | Avg: 52m 56s | Max:  1h 01m | Hits:  62%/3542  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 40m | Avg: 50m 25s | Max: 51m 47s | Hits:  63%/3554  
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 16m 52s | Hits:  88%/3556  
      🔍 rtx2080            Pass:  90%/33  | Total: 18h 24m | Avg: 33m 27s | Max:  1h 01m | Hits:  76%/53324 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 55m | Avg: 23m 32s | Max:  1h 00m | Hits:  85%/17763 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  92%/38  | Total: 21h 10m | Avg: 33m 26s | Max:  1h 01m | Hits:  75%/62206 
      🟩 TestCPU            Pass: 100%/3   | Total: 52m 53s | Avg: 17m 37s | Max: 36m 23s | Hits:  90%/5326  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 41s | Avg: 11m 10s | Max: 11m 45s | Hits:  99%/7111  
    🔍 std: 17 🔍
      🔍 17                 Pass:  85%/20  | Total: 11h 48m | Avg: 35m 26s | Max:  1h 01m | Hits:  76%/30218 
      🟩 20                 Pass: 100%/23  | Total: 10h 20m | Avg: 26m 59s | Max:  1h 00m | Hits:  80%/40869 
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total:  3h 08m | Avg: 37m 45s | Max:  1h 01m | Hits:  77%/7110  
      🟩 12.5               Pass: 100%/2   | Total:  1h 40m | Avg: 50m 25s | Max: 51m 47s | Hits:  63%/3554  
      🟨 12.8               Pass:  94%/38  | Total: 17h 58m | Avg: 28m 23s | Max:  1h 00m | Hits:  80%/63979 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 47m 45s | Avg: 23m 52s | Max: 24m 04s | Hits:  77%/3554  
      🟨 nvcc12.0           Pass:  80%/5   | Total:  3h 08m | Avg: 37m 45s | Max:  1h 01m | Hits:  77%/7110  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 40m | Avg: 50m 25s | Max: 51m 47s | Hits:  63%/3554  
      🟨 nvcc12.8           Pass:  94%/36  | Total: 17h 10m | Avg: 28m 38s | Max:  1h 00m | Hits:  80%/60425 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 00m | Avg: 30m 14s | Max: 30m 56s | Hits:  77%/7108  
      🟩 Clang15            Pass: 100%/2   | Total: 58m 04s | Avg: 29m 02s | Max: 29m 47s | Hits:  77%/3554  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 00m | Avg: 30m 27s | Max: 31m 14s | Hits:  77%/3554  
      🟩 Clang17            Pass: 100%/2   | Total: 59m 53s | Avg: 29m 56s | Max: 30m 18s | Hits:  77%/3554  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 31m | Avg: 21m 36s | Max: 30m 59s | Hits:  83%/12439 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 50s | Max: 34m 15s | Hits:  76%/3556  
      🟩 GCC8               Pass: 100%/1   | Total: 31m 24s | Avg: 31m 24s | Max: 31m 24s | Hits:  76%/1778  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 26s | Max: 32m 51s | Hits:  76%/3556  
      🟩 GCC10              Pass: 100%/2   | Total: 59m 52s | Avg: 29m 56s | Max: 30m 29s | Hits:  76%/3556  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 15s | Max: 30m 51s | Hits:  76%/3556  
      🟩 GCC12              Pass: 100%/2   | Total: 59m 48s | Avg: 29m 54s | Max: 30m 21s | Hits:  76%/3556  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 31m | Avg: 21m 09s | Max: 32m 28s | Hits:  86%/17780 
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 54m | Avg: 57m 19s | Max:  1h 01m
      🟨 MSVC14.42          Pass:  66%/3   | Total:  2h 30m | Avg: 50m 00s | Max:  1h 00m | Hits:  62%/3542  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 25s | Max: 51m 47s | Hits:  63%/3554  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 38m 36s | Avg: 19m 18s | Max: 27m 21s | Hits:  88%/3556  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 16m 52s | Hits:  88%/3556  
      🟩 90;90a;100         Pass: 100%/1   | Total: 30m 57s | Avg: 30m 57s | Max: 30m 57s | Hits:  76%/1778  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 41s | Avg: 7m 50s | Max: 12m 53s | Hits: 97%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 41s | Avg:  7m 50s | Max: 12m 53s | Hits:  97%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 41s | Avg:  7m 50s | Max: 12m 53s | Hits:  97%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 41s | Avg:  7m 50s | Max: 12m 53s | Hits:  97%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 41s | Avg:  7m 50s | Max: 12m 53s | Hits:  97%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 41s | Avg:  7m 50s | Max: 12m 53s | Hits:  97%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 41s | Avg:  7m 50s | Max: 12m 53s | Hits:  97%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 41s | Avg:  7m 50s | Max: 12m 53s | Hits:  97%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 48s | Avg:  2m 48s | Max:  2m 48s | Hits:  96%/154   
      🟩 Test               Pass: 100%/1   | Total: 12m 53s | Avg: 12m 53s | Max: 12m 53s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 03m | Avg: 1h 03m | Max: 1h 03m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@fbusato
Copy link
Contributor Author

fbusato commented Mar 6, 2025

Using cuda::bitfield_extract with range checks enabled found a potential bug in the segmented radix sort test.
The test could generate begin_bit == end_bit which translates to num_bits == 0. I fixed the test by skipping this configuration and moved some computation after the bit check to improve execution time.
@elstehle could you please review my changes?

@fbusato fbusato requested a review from elstehle March 6, 2025 17:39
@elstehle
Copy link
Contributor

elstehle commented Mar 6, 2025

Thanks! I'm out until Monday. Will review then. Could you, in the meantime, please run the benchmarks for radix sort wnd share the results here?

@fbusato
Copy link
Contributor Author

fbusato commented Mar 6, 2025

I can take a look

Copy link
Contributor

github-actions bot commented Mar 6, 2025

🟨 CI finished in 1h 15m: Pass: 90%/93 | Total: 18h 36m | Avg: 12m 00s | Max: 1h 00m | Hits: 97%/121785
  • 🟨 cub: Pass: 86%/45 | Total: 10h 31m | Avg: 14m 01s | Max: 37m 47s | Hits: 97%/46834

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  86%/43  | Total: 10h 11m | Avg: 14m 13s | Max: 37m 47s | Hits:  96%/44398 
      🟩 arm64              Pass: 100%/2   | Total: 20m 04s | Avg: 10m 02s | Max: 10m 18s | Hits:  98%/2436  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 15m 34s | Avg:  7m 47s | Max:  7m 53s | Hits:  99%/2104  
      🔍 nvcc               Pass:  86%/43  | Total: 10h 15m | Avg: 14m 19s | Max: 37m 47s | Hits:  96%/44730 
    🔍 sm: 90 🔍
      🔍 90                 Pass:  66%/3   | Total: 44m 03s | Avg: 14m 41s | Max: 23m 58s | Hits:  99%/2436  
      🟩 90;90a;100         Pass: 100%/1   | Total: 10m 34s | Avg: 10m 34s | Max: 10m 34s | Hits:  98%/1218  
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total:  1h 15m | Avg: 15m 02s | Max: 32m 33s | Hits:  98%/4880  
      🟩 12.5               Pass: 100%/2   | Total: 28m 31s | Avg: 14m 15s | Max: 14m 49s | Hits:  97%/2254  
      🟨 12.8               Pass:  86%/38  | Total:  8h 47m | Avg: 13m 53s | Max: 37m 47s | Hits:  96%/39700 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 15m 34s | Avg:  7m 47s | Max:  7m 53s | Hits:  99%/2104  
      🟨 nvcc12.0           Pass:  80%/5   | Total:  1h 15m | Avg: 15m 02s | Max: 32m 33s | Hits:  98%/4880  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 31s | Avg: 14m 15s | Max: 14m 49s | Hits:  97%/2254  
      🟨 nvcc12.8           Pass:  86%/36  | Total:  8h 32m | Avg: 14m 13s | Max: 37m 47s | Hits:  96%/37596 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 41m 23s | Avg: 10m 20s | Max: 11m 01s | Hits:  99%/4880  
      🟩 Clang15            Pass: 100%/2   | Total: 20m 53s | Avg: 10m 26s | Max: 11m 06s | Hits:  99%/2436  
      🟩 Clang16            Pass: 100%/2   | Total: 19m 42s | Avg:  9m 51s | Max: 10m 01s | Hits:  99%/2436  
      🟩 Clang17            Pass: 100%/2   | Total: 20m 32s | Avg: 10m 16s | Max: 10m 49s | Hits:  99%/2436  
      🟨 Clang18            Pass:  85%/7   | Total:  1h 22m | Avg: 11m 43s | Max: 21m 54s | Hits:  99%/6976  
      🟩 GCC7               Pass: 100%/2   | Total: 20m 56s | Avg: 10m 28s | Max: 10m 58s | Hits:  98%/2440  
      🟩 GCC8               Pass: 100%/1   | Total:  9m 43s | Avg:  9m 43s | Max:  9m 43s | Hits:  98%/1220  
      🟩 GCC9               Pass: 100%/2   | Total: 21m 22s | Avg: 10m 41s | Max: 10m 45s | Hits:  98%/2440  
      🟩 GCC10              Pass: 100%/2   | Total: 19m 56s | Avg:  9m 58s | Max: 10m 05s | Hits:  98%/2440  
      🟩 GCC11              Pass: 100%/2   | Total: 20m 03s | Avg: 10m 01s | Max: 10m 14s | Hits:  98%/2436  
      🟩 GCC12              Pass: 100%/2   | Total: 20m 02s | Avg: 10m 01s | Max: 10m 06s | Hits:  98%/2436  
      🟨 GCC13              Pass:  81%/11  | Total:  2h 45m | Avg: 15m 00s | Max: 23m 58s | Hits:  99%/10962 
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 10m | Avg: 35m 10s | Max: 37m 47s
      🟨 MSVC14.42          Pass:  50%/2   | Total:  1h 10m | Avg: 35m 28s | Max: 36m 10s | Hits:  15%/1042  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 31s | Avg: 14m 15s | Max: 14m 49s | Hits:  97%/2254  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total:  3h 04m | Avg: 10m 51s | Max: 21m 54s | Hits:  99%/19164 
      🟨 GCC                Pass:  90%/22  | Total:  4h 37m | Avg: 12m 35s | Max: 23m 58s | Hits:  98%/24374 
      🟨 MSVC               Pass:  25%/4   | Total:  2h 21m | Avg: 35m 19s | Max: 37m 47s | Hits:  15%/1042  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 31s | Avg: 14m 15s | Max: 14m 49s | Hits:  97%/2254  
    🟨 jobs
      🟨 Build              Pass:  91%/37  | Total:  7h 59m | Avg: 12m 57s | Max: 37m 47s | Hits:  96%/40744 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 41s | Avg: 21m 41s | Max: 21m 41s | Hits:  99%/1218  
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 15s | Avg: 17m 15s | Max: 17m 15s | Hits:  99%/1218  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 09m | Avg: 23m 06s | Max: 23m 58s | Hits:  99%/3654  
      🟥 TestGPU            Pass:   0%/3   | Total: 44m 04s | Avg: 14m 41s | Max: 15m 43s
    🟨 gpu
      🟨 h100               Pass:  66%/3   | Total: 44m 03s | Avg: 14m 41s | Max: 23m 58s | Hits:  99%/2436  
      🟨 rtx2080            Pass:  91%/34  | Total:  7h 30m | Avg: 13m 15s | Max: 37m 47s | Hits:  96%/37090 
      🟨 rtxa6000           Pass:  75%/8   | Total:  2h 16m | Avg: 17m 03s | Max: 23m 28s | Hits:  99%/7308  
    🟨 std
      🟨 17                 Pass:  85%/20  | Total:  4h 41m | Avg: 14m 05s | Max: 37m 47s | Hits:  98%/20465 
      🟨 20                 Pass:  88%/25  | Total:  5h 49m | Avg: 13m 59s | Max: 36m 10s | Hits:  95%/26369 
    
  • 🟨 thrust: Pass: 93%/45 | Total: 6h 49m | Avg: 9m 05s | Max: 32m 25s | Hits: 98%/74643

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  93%/43  | Total:  6h 39m | Avg:  9m 17s | Max: 32m 25s | Hits:  98%/71088 
      🟩 arm64              Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  5m 14s | Hits:  99%/3555  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 33s | Hits: 100%/3554  
      🔍 nvcc               Pass:  93%/43  | Total:  6h 38m | Avg:  9m 15s | Max: 32m 25s | Hits:  98%/71089 
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/17  | Total:  1h 37m | Avg:  5m 45s | Max: 10m 14s | Hits: 100%/30209 
      🟩 GCC                Pass: 100%/21  | Total:  2h 19m | Avg:  6m 38s | Max: 11m 29s | Hits:  99%/37338 
      🔍 MSVC               Pass:  40%/5   | Total:  2h 23m | Avg: 28m 40s | Max: 32m 25s | Hits:  70%/3542  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 11s | Avg: 14m 05s | Max: 14m 09s | Hits:  99%/3554  
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/2   | Total: 16m 37s | Avg:  8m 18s | Max: 11m 29s | Hits:  99%/3556  
      🔍 rtx2080            Pass:  90%/33  | Total:  4h 24m | Avg:  8m 00s | Max: 28m 07s | Hits:  99%/53324 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 08m | Avg: 12m 49s | Max: 32m 25s | Hits:  94%/17763 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  92%/38  | Total:  5h 16m | Avg:  8m 20s | Max: 29m 37s | Hits:  99%/62206 
      🟩 TestCPU            Pass: 100%/3   | Total: 47m 59s | Avg: 15m 59s | Max: 32m 25s | Hits:  90%/5326  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 14s | Avg: 11m 03s | Max: 11m 29s | Hits:  99%/7111  
    🔍 std: 17 🔍
      🔍 17                 Pass:  85%/20  | Total:  3h 05m | Avg:  9m 16s | Max: 28m 07s | Hits:  99%/30218 
      🟩 20                 Pass: 100%/23  | Total:  3h 26m | Avg:  8m 57s | Max: 32m 25s | Hits:  97%/40869 
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total: 48m 33s | Avg:  9m 42s | Max: 28m 07s | Hits:  99%/7110  
      🟩 12.5               Pass: 100%/2   | Total: 28m 11s | Avg: 14m 05s | Max: 14m 09s | Hits:  99%/3554  
      🟨 12.8               Pass:  94%/38  | Total:  5h 32m | Avg:  8m 44s | Max: 32m 25s | Hits:  98%/63979 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 33s | Hits: 100%/3554  
      🟨 nvcc12.0           Pass:  80%/5   | Total: 48m 33s | Avg:  9m 42s | Max: 28m 07s | Hits:  99%/7110  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 11s | Avg: 14m 05s | Max: 14m 09s | Hits:  99%/3554  
      🟨 nvcc12.8           Pass:  94%/36  | Total:  5h 21m | Avg:  8m 55s | Max: 32m 25s | Hits:  98%/60425 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 00s | Avg:  5m 15s | Max:  5m 39s | Hits: 100%/7108  
      🟩 Clang15            Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  5m 30s | Hits: 100%/3554  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 55s | Hits: 100%/3554  
      🟩 Clang17            Pass: 100%/2   | Total: 10m 57s | Avg:  5m 28s | Max:  5m 43s | Hits: 100%/3554  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 38s | Avg:  6m 14s | Max: 10m 14s | Hits: 100%/12439 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  5m 36s | Hits:  99%/3556  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 27s | Avg:  5m 27s | Max:  5m 27s | Hits:  99%/1778  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 48s | Hits:  99%/3556  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 13s | Avg:  5m 36s | Max:  5m 50s | Hits:  99%/3556  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 51s | Avg:  5m 55s | Max:  5m 58s | Hits:  99%/3556  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 10s | Avg:  6m 05s | Max:  6m 20s | Hits:  99%/3556  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 17m | Avg:  7m 42s | Max: 11m 29s | Hits:  99%/17780 
      🟥 MSVC14.29          Pass:   0%/2   | Total: 54m 48s | Avg: 27m 24s | Max: 28m 07s
      🟨 MSVC14.42          Pass:  66%/3   | Total:  1h 28m | Avg: 29m 31s | Max: 32m 25s | Hits:  70%/3542  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 11s | Avg: 14m 05s | Max: 14m 09s | Hits:  99%/3554  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 32s | Avg:  8m 46s | Max: 11m 09s | Hits:  99%/3556  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 37s | Avg:  8m 18s | Max: 11m 29s | Hits:  99%/3556  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 00s | Avg:  6m 00s | Max:  6m 00s | Hits:  99%/1778  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 05s | Avg: 7m 32s | Max: 12m 46s | Hits: 98%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 46s | Hits:  98%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 46s | Hits:  98%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 46s | Hits:  98%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 46s | Hits:  98%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 46s | Hits:  98%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 46s | Hits:  98%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 05s | Avg:  7m 32s | Max: 12m 46s | Hits:  98%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 19s | Avg:  2m 19s | Max:  2m 19s | Hits:  98%/154   
      🟩 Test               Pass: 100%/1   | Total: 12m 46s | Avg: 12m 46s | Max: 12m 46s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@fbusato fbusato requested a review from a team as a code owner March 6, 2025 22:24
@fbusato fbusato requested a review from ericniebler March 6, 2025 22:24
Copy link
Contributor

github-actions bot commented Mar 7, 2025

🟨 CI finished in 1h 33m: Pass: 98%/158 | Total: 2d 13h | Avg: 23m 16s | Max: 1h 16m | Hits: 84%/246942
  • 🟨 cub: Pass: 93%/45 | Total: 1d 12h | Avg: 48m 07s | Max: 1h 16m | Hits: 80%/49960

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  93%/43  | Total:  1d 10h | Avg: 47m 48s | Max:  1h 16m | Hits:  80%/47524 
      🟩 arm64              Pass: 100%/2   | Total:  1h 50m | Avg: 55m 08s | Max: 56m 14s | Hits:  85%/2436  
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 41m | Avg: 56m 13s | Max:  1h 06m | Hits:  73%/5922  
      🟩 12.5               Pass: 100%/2   | Total:  1h 52m | Avg: 56m 04s | Max: 56m 35s | Hits:  83%/2254  
      🔍 12.8               Pass:  92%/38  | Total:  1d 05h | Avg: 46m 38s | Max:  1h 16m | Hits:  81%/41784 
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 57m | Avg: 58m 31s | Max:  1h 00m | Hits:  85%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 41m | Avg: 56m 13s | Max:  1h 06m | Hits:  73%/5922  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 52m | Avg: 56m 04s | Max: 56m 35s | Hits:  83%/2254  
      🔍 nvcc12.8           Pass:  91%/36  | Total:  1d 03h | Avg: 45m 59s | Max:  1h 16m | Hits:  81%/39680 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 31s | Max:  1h 00m | Hits:  85%/2104  
      🔍 nvcc               Pass:  93%/43  | Total:  1d 10h | Avg: 47m 38s | Max:  1h 16m | Hits:  80%/47856 
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/37  | Total:  1d 09h | Avg: 54m 18s | Max:  1h 16m | Hits:  78%/43870 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 14s | Avg: 21m 14s | Max: 21m 14s | Hits:  99%/1218  
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 23s | Avg: 19m 23s | Max: 19m 23s | Hits:  99%/1218  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 09m | Avg: 23m 14s | Max: 24m 03s | Hits:  99%/3654  
      🔥 TestGPU            Pass:   0%/3   | Total: 45m 53s | Avg: 15m 17s | Max: 17m 37s
    🔍 sm: 90 🔍
      🔍 90                 Pass:  66%/3   | Total: 59m 47s | Avg: 19m 55s | Max: 24m 03s | Hits:  92%/2436  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  85%/1218  
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 18h 32m | Avg: 55m 37s | Max:  1h 15m | Hits:  74%/23591 
      🔍 20                 Pass:  88%/25  | Total: 17h 33m | Avg: 42m 08s | Max:  1h 16m | Hits:  85%/26369 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 35s | Max: 55m 11s | Hits:  85%/4880  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 39m | Avg: 49m 57s | Max: 51m 23s | Hits:  85%/2436  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 39m | Avg: 49m 52s | Max: 50m 48s | Hits:  85%/2436  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 41m | Avg: 50m 56s | Max: 51m 45s | Hits:  85%/2436  
      🟨 Clang18            Pass:  85%/7   | Total:  5h 10m | Avg: 44m 17s | Max:  1h 00m | Hits:  88%/6976  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 45m | Avg: 52m 36s | Max: 54m 03s | Hits:  85%/2440  
      🟩 GCC8               Pass: 100%/1   | Total: 50m 33s | Avg: 50m 33s | Max: 50m 33s | Hits:  85%/1220  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 50m | Avg: 55m 22s | Max: 56m 56s | Hits:  85%/2440  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 50s | Max:  1h 01m | Hits:  73%/2440  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 39m | Avg: 49m 41s | Max: 50m 56s | Hits:  85%/2436  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 45m | Avg: 52m 33s | Max: 53m 55s | Hits:  85%/2436  
      🟨 GCC13              Pass:  81%/11  | Total:  5h 57m | Avg: 32m 27s | Max:  1h 00m | Hits:  91%/10962 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m | Hits:  15%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 31m | Avg:  1h 15m | Max:  1h 16m | Hits:  15%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 52m | Avg: 56m 04s | Max: 56m 35s | Hits:  83%/2254  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total: 13h 41m | Avg: 48m 21s | Max:  1h 00m | Hits:  86%/19164 
      🟨 GCC                Pass:  90%/22  | Total: 15h 43m | Avg: 42m 53s | Max:  1h 01m | Hits:  87%/24374 
      🟩 MSVC               Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 16m | Hits:  15%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 04s | Max: 56m 35s | Hits:  83%/2254  
    🟨 gpu
      🟨 h100               Pass:  66%/3   | Total: 59m 47s | Avg: 19m 55s | Max: 24m 03s | Hits:  92%/2436  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 07h | Avg: 55m 25s | Max:  1h 16m | Hits:  77%/40216 
      🟨 rtxa6000           Pass:  75%/8   | Total:  3h 41m | Avg: 27m 41s | Max: 52m 18s | Hits:  94%/7308  
    
  • 🟩 thrust: Pass: 100%/45 | Total: 12h 21m | Avg: 16m 28s | Max: 39m 26s | Hits: 92%/79956

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 23m 57s | Avg: 11m 58s | Max: 12m 41s | Hits:  97%/3556  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 11h 56m | Avg: 16m 39s | Max: 39m 26s | Hits:  92%/76401 
      🟩 arm64              Pass: 100%/2   | Total: 25m 36s | Avg: 12m 48s | Max: 13m 15s | Hits:  94%/3555  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 33m | Avg: 18m 42s | Max: 33m 29s | Hits:  89%/8881  
      🟩 12.5               Pass: 100%/2   | Total: 53m 57s | Avg: 26m 58s | Max: 28m 28s | Hits:  93%/3554  
      🟩 12.8               Pass: 100%/38  | Total:  9h 54m | Avg: 15m 38s | Max: 39m 26s | Hits:  92%/67521 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 26m 38s | Avg: 13m 19s | Max: 13m 39s | Hits:  94%/3554  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 33m | Avg: 18m 42s | Max: 33m 29s | Hits:  89%/8881  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 53m 57s | Avg: 26m 58s | Max: 28m 28s | Hits:  93%/3554  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  9h 27m | Avg: 15m 45s | Max: 39m 26s | Hits:  92%/63967 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 26m 38s | Avg: 13m 19s | Max: 13m 39s | Hits:  94%/3554  
      🟩 nvcc               Pass: 100%/43  | Total: 11h 55m | Avg: 16m 37s | Max: 39m 26s | Hits:  92%/76402 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 57m 05s | Avg: 14m 16s | Max: 15m 20s | Hits:  94%/7108  
      🟩 Clang15            Pass: 100%/2   | Total: 28m 31s | Avg: 14m 15s | Max: 14m 28s | Hits:  94%/3554  
      🟩 Clang16            Pass: 100%/2   | Total: 29m 32s | Avg: 14m 46s | Max: 14m 49s | Hits:  94%/3554  
      🟩 Clang17            Pass: 100%/2   | Total: 27m 50s | Avg: 13m 55s | Max: 14m 31s | Hits:  94%/3554  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 25m | Avg: 12m 11s | Max: 14m 48s | Hits:  96%/12439 
      🟩 GCC7               Pass: 100%/2   | Total: 30m 44s | Avg: 15m 22s | Max: 15m 36s | Hits:  94%/3556  
      🟩 GCC8               Pass: 100%/1   | Total: 13m 50s | Avg: 13m 50s | Max: 13m 50s | Hits:  94%/1778  
      🟩 GCC9               Pass: 100%/2   | Total: 31m 33s | Avg: 15m 46s | Max: 16m 09s | Hits:  94%/3556  
      🟩 GCC10              Pass: 100%/2   | Total: 30m 15s | Avg: 15m 07s | Max: 15m 37s | Hits:  94%/3556  
      🟩 GCC11              Pass: 100%/2   | Total: 29m 57s | Avg: 14m 58s | Max: 15m 10s | Hits:  94%/3556  
      🟩 GCC12              Pass: 100%/2   | Total: 29m 06s | Avg: 14m 33s | Max: 14m 36s | Hits:  94%/3556  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 58m | Avg: 11m 50s | Max: 15m 09s | Hits:  96%/17780 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 08m | Avg: 34m 15s | Max: 35m 02s | Hits:  66%/3542  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 46m | Avg: 35m 39s | Max: 39m 26s | Hits:  67%/5313  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 53m 57s | Avg: 26m 58s | Max: 28m 28s | Hits:  93%/3554  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  3h 48m | Avg: 13m 25s | Max: 15m 20s | Hits:  95%/30209 
      🟩 GCC                Pass: 100%/21  | Total:  4h 43m | Avg: 13m 31s | Max: 16m 09s | Hits:  95%/37338 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 55m | Avg: 35m 05s | Max: 39m 26s | Hits:  67%/8855  
      🟩 NVHPC              Pass: 100%/2   | Total: 53m 57s | Avg: 26m 58s | Max: 28m 28s | Hits:  93%/3554  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 19m 09s | Avg:  9m 34s | Max: 11m 16s | Hits:  97%/3556  
      🟩 rtx2080            Pass: 100%/33  | Total:  9h 20m | Avg: 16m 58s | Max: 35m 02s | Hits:  92%/58637 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 42m | Avg: 16m 15s | Max: 39m 26s | Hits:  92%/17763 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 10h 49m | Avg: 17m 05s | Max: 39m 26s | Hits:  91%/67519 
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 32s | Avg: 16m 10s | Max: 33m 06s | Hits:  90%/5326  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 27s | Avg: 10m 51s | Max: 11m 16s | Hits:  99%/7111  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 19m 09s | Avg:  9m 34s | Max: 11m 16s | Hits:  97%/3556  
      🟩 90;90a;100         Pass: 100%/1   | Total: 13m 21s | Avg: 13m 21s | Max: 13m 21s | Hits:  94%/1778  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  6h 08m | Avg: 18m 25s | Max: 35m 02s | Hits:  90%/35531 
      🟩 20                 Pass: 100%/23  | Total:  5h 49m | Avg: 15m 11s | Max: 39m 26s | Hits:  93%/40869 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 9h 09m | Avg: 12m 46s | Max: 36m 52s | Hits: 79%/104996

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  8h 42m | Avg: 12m 44s | Max: 36m 52s | Hits:  79%/99253 
      🟩 arm64              Pass: 100%/2   | Total: 27m 01s | Avg: 13m 30s | Max: 23m 23s | Hits:  65%/5743  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 18m | Avg: 15m 39s | Max: 27m 04s | Hits:  70%/13988 
      🟩 12.5               Pass: 100%/2   | Total: 45m 24s | Avg: 22m 42s | Max: 36m 52s | Hits:  62%/5688  
      🟩 12.8               Pass: 100%/36  | Total:  7h 05m | Avg: 11m 49s | Max: 30m 45s | Hits:  81%/85320 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 44m 29s | Avg: 22m 14s | Max: 23m 56s | Hits:  27%/5704  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 18m | Avg: 15m 39s | Max: 27m 04s | Hits:  70%/13988 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 45m 24s | Avg: 22m 42s | Max: 36m 52s | Hits:  62%/5688  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  6h 21m | Avg: 11m 12s | Max: 30m 45s | Hits:  85%/79616 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 29s | Avg: 22m 14s | Max: 23m 56s | Hits:  27%/5704  
      🟩 nvcc               Pass: 100%/41  | Total:  8h 24m | Avg: 12m 18s | Max: 36m 52s | Hits:  82%/99292 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 18m 19s | Avg:  4m 34s | Max:  5m 29s | Hits:  97%/11376 
      🟩 Clang15            Pass: 100%/2   | Total: 29m 57s | Avg: 14m 58s | Max: 25m 05s | Hits:  65%/5700  
      🟩 Clang16            Pass: 100%/2   | Total:  9m 12s | Avg:  4m 36s | Max:  4m 39s | Hits:  99%/5700  
      🟩 Clang17            Pass: 100%/2   | Total:  9m 18s | Avg:  4m 39s | Max:  4m 50s | Hits:  99%/5700  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 34m | Avg: 15m 40s | Max: 23m 56s | Hits:  56%/14275 
      🟩 GCC7               Pass: 100%/2   | Total: 41m 17s | Avg: 20m 38s | Max: 20m 42s | Hits:  34%/5638  
      🟩 GCC8               Pass: 100%/1   | Total:  7m 00s | Avg:  7m 00s | Max:  7m 00s | Hits:  91%/2829  
      🟩 GCC9               Pass: 100%/2   | Total: 24m 56s | Avg: 12m 28s | Max: 21m 05s | Hits:  65%/5650  
      🟩 GCC10              Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  5m 32s | Hits:  96%/5706  
      🟩 GCC11              Pass: 100%/2   | Total: 29m 49s | Avg: 14m 54s | Max: 22m 49s | Hits:  61%/5702  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 21s | Avg:  5m 40s | Max:  7m 22s | Hits:  95%/5702  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 48m | Avg: 10m 48s | Max: 30m 45s | Hits:  83%/14536 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 54m 24s | Avg: 27m 12s | Max: 27m 20s | Hits:  98%/5364  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 56m 34s | Avg: 28m 17s | Max: 30m 30s | Hits:  95%/5430  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 45m 24s | Avg: 22m 42s | Max: 36m 52s | Hits:  62%/5688  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  2h 40m | Avg: 10m 03s | Max: 25m 05s | Hits:  80%/42751 
      🟩 GCC                Pass: 100%/21  | Total:  3h 52m | Avg: 11m 03s | Max: 30m 45s | Hits:  76%/45763 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 50m | Avg: 27m 44s | Max: 30m 30s | Hits:  97%/10794 
      🟩 NVHPC              Pass: 100%/2   | Total: 45m 24s | Avg: 22m 42s | Max: 36m 52s | Hits:  62%/5688  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 17s | Avg:  8m 08s | Max: 11m 51s | Hits:  98%/2961  
      🟩 rtx2080            Pass: 100%/41  | Total:  8h 52m | Avg: 12m 59s | Max: 36m 52s | Hits:  78%/102035
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  7h 53m | Avg: 12m 48s | Max: 36m 52s | Hits:  79%/104956
      🟩 NVRTC              Pass: 100%/2   | Total: 35m 16s | Avg: 17m 38s | Max: 18m 13s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 37m 54s | Avg: 12m 38s | Max: 16m 39s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 14s | Avg:  2m 14s | Max:  2m 14s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 35m 16s | Avg: 17m 38s | Max: 18m 13s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 16m 17s | Avg:  8m 08s | Max: 11m 51s | Hits:  98%/2961  
      🟩 90;90a;100         Pass: 100%/1   | Total: 30m 45s | Avg: 30m 45s | Max: 30m 45s | Hits:  30%/2961  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  4h 25m | Avg: 12m 37s | Max: 27m 20s | Hits:  80%/56127 
      🟩 20                 Pass: 100%/21  | Total:  4h 41m | Avg: 13m 25s | Max: 36m 52s | Hits:  77%/48869 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 20m | Avg: 6m 23s | Max: 14m 07s | Hits: 94%/11722

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  2h 05m | Avg:  6m 58s | Max: 14m 07s | Hits:  94%/9406  
      🟩 arm64              Pass: 100%/4   | Total: 15m 18s | Avg:  3m 49s | Max:  4m 04s | Hits:  96%/2316  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 14m 07s | Avg: 14m 07s | Max: 14m 07s | Hits:  57%/277   
      🟩 12.5               Pass: 100%/2   | Total: 12m 51s | Avg:  6m 25s | Max:  6m 46s | Hits:  92%/742   
      🟩 12.8               Pass: 100%/19  | Total:  1h 53m | Avg:  5m 59s | Max: 14m 05s | Hits:  95%/10703 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 14m 07s | Avg: 14m 07s | Max: 14m 07s | Hits:  57%/277   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 51s | Avg:  6m 25s | Max:  6m 46s | Hits:  92%/742   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 53m | Avg:  5m 59s | Max: 14m 05s | Hits:  95%/10703 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 20m | Avg:  6m 23s | Max: 14m 07s | Hits:  94%/11722 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s | Hits:  96%/581   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s | Hits:  96%/579   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 34s | Avg:  4m 34s | Max:  4m 34s | Hits:  96%/579   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 23s | Avg:  4m 23s | Max:  4m 23s | Hits:  96%/579   
      🟩 Clang18            Pass: 100%/4   | Total: 23m 48s | Avg:  5m 57s | Max: 11m 57s | Hits:  97%/2316  
      🟩 GCC10              Pass: 100%/1   | Total:  4m 26s | Avg:  4m 26s | Max:  4m 26s | Hits:  96%/581   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 38s | Avg:  4m 38s | Max:  4m 38s | Hits:  96%/579   
      🟩 GCC12              Pass: 100%/2   | Total: 16m 47s | Avg:  8m 23s | Max: 12m 18s | Hits:  97%/1158  
      🟩 GCC13              Pass: 100%/6   | Total: 33m 33s | Avg:  5m 35s | Max: 14m 05s | Hits:  96%/3474  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 14m 07s | Avg: 14m 07s | Max: 14m 07s | Hits:  57%/277   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 13m 08s | Avg: 13m 08s | Max: 13m 08s | Hits:  57%/277   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 51s | Avg:  6m 25s | Max:  6m 46s | Hits:  92%/742   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 41m 15s | Avg:  5m 09s | Max: 11m 57s | Hits:  96%/4634  
      🟩 GCC                Pass: 100%/10  | Total: 59m 24s | Avg:  5m 56s | Max: 14m 05s | Hits:  96%/5792  
      🟩 MSVC               Pass: 100%/2   | Total: 27m 15s | Avg: 13m 37s | Max: 14m 07s | Hits:  57%/554   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 51s | Avg:  6m 25s | Max:  6m 46s | Hits:  92%/742   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 49s | Avg:  8m 54s | Max: 14m 05s | Hits:  97%/1158  
      🟩 rtx2080            Pass: 100%/20  | Total:  2h 02m | Avg:  6m 08s | Max: 14m 07s | Hits:  94%/10564 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 42m | Avg:  5m 23s | Max: 14m 07s | Hits:  93%/9985  
      🟩 Test               Pass: 100%/3   | Total: 38m 20s | Avg: 12m 46s | Max: 14m 05s | Hits:  99%/1737  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 21m 35s | Avg:  7m 11s | Max: 14m 05s | Hits:  97%/1737  
      🟩 90a                Pass: 100%/1   | Total:  4m 00s | Avg:  4m 00s | Max:  4m 00s | Hits:  96%/579   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 18m 08s | Avg:  4m 32s | Max:  6m 46s | Hits:  95%/2108  
      🟩 20                 Pass: 100%/18  | Total:  2h 02m | Avg:  6m 48s | Max: 14m 07s | Hits:  94%/9614  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 35s | Avg: 7m 47s | Max: 13m 09s | Hits: 98%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 35s | Avg:  7m 47s | Max: 13m 09s | Hits:  98%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 35s | Avg:  7m 47s | Max: 13m 09s | Hits:  98%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 35s | Avg:  7m 47s | Max: 13m 09s | Hits:  98%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 35s | Avg:  7m 47s | Max: 13m 09s | Hits:  98%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 35s | Avg:  7m 47s | Max: 13m 09s | Hits:  98%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 35s | Avg:  7m 47s | Max: 13m 09s | Hits:  98%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 35s | Avg:  7m 47s | Max: 13m 09s | Hits:  98%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 26s | Avg:  2m 26s | Max:  2m 26s | Hits:  98%/154   
      🟩 Test               Pass: 100%/1   | Total: 13m 09s | Avg: 13m 09s | Max: 13m 09s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 03m | Avg: 1h 03m | Max: 1h 03m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

Copy link
Contributor

github-actions bot commented Mar 7, 2025

🟩 CI finished in 1h 18m: Pass: 100%/158 | Total: 1d 05h | Avg: 11m 10s | Max: 1h 01m | Hits: 91%/250596
  • 🟩 cub: Pass: 100%/45 | Total: 11h 53m | Avg: 15m 50s | Max: 49m 04s | Hits: 92%/53614

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 11h 35m | Avg: 16m 09s | Max: 49m 04s | Hits:  91%/51178 
      🟩 arm64              Pass: 100%/2   | Total: 18m 12s | Avg:  9m 06s | Max:  9m 32s | Hits:  98%/2436  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 17m | Avg: 15m 25s | Max: 35m 36s | Hits:  83%/5922  
      🟩 12.5               Pass: 100%/2   | Total: 31m 55s | Avg: 15m 57s | Max: 16m 17s | Hits:  97%/2254  
      🟩 12.8               Pass: 100%/38  | Total: 10h 04m | Avg: 15m 54s | Max: 49m 04s | Hits:  92%/45438 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 16m 35s | Avg:  8m 17s | Max:  8m 27s | Hits:  98%/2104  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 17m | Avg: 15m 25s | Max: 35m 36s | Hits:  83%/5922  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 31m 55s | Avg: 15m 57s | Max: 16m 17s | Hits:  97%/2254  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  9h 47m | Avg: 16m 19s | Max: 49m 04s | Hits:  92%/43334 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 16m 35s | Avg:  8m 17s | Max:  8m 27s | Hits:  98%/2104  
      🟩 nvcc               Pass: 100%/43  | Total: 11h 36m | Avg: 16m 12s | Max: 49m 04s | Hits:  91%/51510 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 41m 11s | Avg: 10m 17s | Max: 11m 04s | Hits:  98%/4880  
      🟩 Clang15            Pass: 100%/2   | Total: 19m 59s | Avg:  9m 59s | Max: 10m 12s | Hits:  98%/2436  
      🟩 Clang16            Pass: 100%/2   | Total: 21m 20s | Avg: 10m 40s | Max: 10m 48s | Hits:  98%/2436  
      🟩 Clang17            Pass: 100%/2   | Total: 19m 59s | Avg:  9m 59s | Max: 10m 07s | Hits:  98%/2436  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 30m | Avg: 12m 55s | Max: 22m 50s | Hits:  99%/8194  
      🟩 GCC7               Pass: 100%/2   | Total: 21m 08s | Avg: 10m 34s | Max: 11m 03s | Hits:  98%/2440  
      🟩 GCC8               Pass: 100%/1   | Total: 10m 27s | Avg: 10m 27s | Max: 10m 27s | Hits:  98%/1220  
      🟩 GCC9               Pass: 100%/2   | Total: 22m 13s | Avg: 11m 06s | Max: 11m 42s | Hits:  98%/2440  
      🟩 GCC10              Pass: 100%/2   | Total: 59m 43s | Avg: 29m 51s | Max: 49m 04s | Hits:  95%/2440  
      🟩 GCC11              Pass: 100%/2   | Total: 20m 58s | Avg: 10m 29s | Max: 10m 34s | Hits:  98%/2436  
      🟩 GCC12              Pass: 100%/2   | Total: 21m 59s | Avg: 10m 59s | Max: 11m 38s | Hits:  98%/2436  
      🟩 GCC13              Pass: 100%/11  | Total:  3h 04m | Avg: 16m 48s | Max: 25m 05s | Hits:  99%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 12m | Avg: 36m 18s | Max: 37m 01s | Hits:  15%/2084  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  1h 14m | Avg: 37m 08s | Max: 37m 40s | Hits:  15%/2084  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 31m 55s | Avg: 15m 57s | Max: 16m 17s | Hits:  97%/2254  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  3h 13m | Avg: 11m 21s | Max: 22m 50s | Hits:  98%/20382 
      🟩 GCC                Pass: 100%/22  | Total:  5h 41m | Avg: 15m 31s | Max: 49m 04s | Hits:  98%/26810 
      🟩 MSVC               Pass: 100%/4   | Total:  2h 26m | Avg: 36m 43s | Max: 37m 40s | Hits:  15%/4168  
      🟩 NVHPC              Pass: 100%/2   | Total: 31m 55s | Avg: 15m 57s | Max: 16m 17s | Hits:  97%/2254  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 52m 51s | Avg: 17m 37s | Max: 24m 04s | Hits:  99%/3654  
      🟩 rtx2080            Pass: 100%/34  | Total:  8h 24m | Avg: 14m 50s | Max: 49m 04s | Hits:  89%/40216 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 35m | Avg: 19m 29s | Max: 25m 05s | Hits:  99%/9744  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  8h 51m | Avg: 14m 22s | Max: 49m 04s | Hits:  90%/43870 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 00s | Avg: 21m 00s | Max: 21m 00s | Hits:  99%/1218  
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 43s | Avg: 19m 43s | Max: 19m 43s | Hits:  99%/1218  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 11m | Avg: 23m 47s | Max: 25m 05s | Hits:  99%/3654  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 09m | Avg: 23m 05s | Max: 24m 42s | Hits:  99%/3654  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 52m 51s | Avg: 17m 37s | Max: 24m 04s | Hits:  99%/3654  
      🟩 90;90a;100         Pass: 100%/1   | Total: 11m 10s | Avg: 11m 10s | Max: 11m 10s | Hits:  98%/1218  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  5h 32m | Avg: 16m 37s | Max: 49m 04s | Hits:  87%/23591 
      🟩 20                 Pass: 100%/25  | Total:  6h 20m | Avg: 15m 13s | Max: 37m 40s | Hits:  96%/30023 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 51m | Avg: 9m 08s | Max: 36m 10s | Hits: 96%/79956

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 33s | Avg:  8m 46s | Max: 11m 16s | Hits:  99%/3556  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 41m | Avg:  9m 20s | Max: 36m 10s | Hits:  96%/76401 
      🟩 arm64              Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  5m 08s | Hits:  99%/3555  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 45m 32s | Avg:  9m 06s | Max: 25m 36s | Hits:  94%/8881  
      🟩 12.5               Pass: 100%/2   | Total: 28m 08s | Avg: 14m 04s | Max: 14m 35s | Hits:  99%/3554  
      🟩 12.8               Pass: 100%/38  | Total:  5h 37m | Avg:  8m 53s | Max: 36m 10s | Hits:  96%/67521 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 55s | Avg:  4m 57s | Max:  5m 03s | Hits: 100%/3554  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 45m 32s | Avg:  9m 06s | Max: 25m 36s | Hits:  94%/8881  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 08s | Avg: 14m 04s | Max: 14m 35s | Hits:  99%/3554  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 27m | Avg:  9m 06s | Max: 36m 10s | Hits:  96%/63967 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 55s | Avg:  4m 57s | Max:  5m 03s | Hits: 100%/3554  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 41m | Avg:  9m 20s | Max: 36m 10s | Hits:  96%/76402 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 35s | Avg:  5m 08s | Max:  5m 29s | Hits: 100%/7108  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 56s | Hits: 100%/3554  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 38s | Avg:  5m 49s | Max:  5m 50s | Hits: 100%/3554  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 10s | Avg:  5m 35s | Max:  5m 53s | Hits: 100%/3554  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 34s | Avg:  6m 13s | Max: 10m 05s | Hits: 100%/12439 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 02s | Avg:  5m 01s | Max:  5m 14s | Hits:  99%/3556  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 18s | Avg:  5m 18s | Max:  5m 18s | Hits:  99%/1778  
      🟩 GCC9               Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  5m 32s | Hits:  99%/3556  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 54s | Avg:  5m 27s | Max:  5m 35s | Hits:  99%/3556  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 08s | Hits:  99%/3556  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 52s | Avg:  5m 56s | Max:  6m 17s | Hits:  99%/3556  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 17m | Avg:  7m 43s | Max: 11m 26s | Hits:  99%/17780 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 54m 38s | Avg: 27m 19s | Max: 29m 02s | Hits:  70%/3542  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 31m | Avg: 30m 36s | Max: 36m 10s | Hits:  70%/5313  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 08s | Avg: 14m 04s | Max: 14m 35s | Hits:  99%/3554  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 38m | Avg:  5m 47s | Max: 10m 05s | Hits: 100%/30209 
      🟩 GCC                Pass: 100%/21  | Total:  2h 18m | Avg:  6m 35s | Max: 11m 26s | Hits:  99%/37338 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 26m | Avg: 29m 17s | Max: 36m 10s | Hits:  70%/8855  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 08s | Avg: 14m 04s | Max: 14m 35s | Hits:  99%/3554  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 15m 57s | Avg:  7m 58s | Max: 10m 48s | Hits:  99%/3556  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 25m | Avg:  8m 02s | Max: 29m 02s | Hits:  97%/58637 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 10m | Avg: 13m 02s | Max: 36m 10s | Hits:  94%/17763 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 16m | Avg:  8m 19s | Max: 29m 02s | Hits:  96%/67519 
      🟩 TestCPU            Pass: 100%/3   | Total: 51m 46s | Avg: 17m 15s | Max: 36m 10s | Hits:  90%/5326  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 35s | Avg: 10m 53s | Max: 11m 26s | Hits:  99%/7111  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 15m 57s | Avg:  7m 58s | Max: 10m 48s | Hits:  99%/3556  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 34s | Avg:  6m 34s | Max:  6m 34s | Hits:  99%/1778  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 06m | Avg:  9m 18s | Max: 29m 02s | Hits:  95%/35531 
      🟩 20                 Pass: 100%/23  | Total:  3h 27m | Avg:  9m 02s | Max: 36m 10s | Hits:  97%/40869 
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 7h 22m | Avg: 10m 16s | Max: 30m 20s | Hits: 87%/104996

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  7h 12m | Avg: 10m 33s | Max: 30m 20s | Hits:  86%/99253 
      🟩 arm64              Pass: 100%/2   | Total:  9m 13s | Avg:  4m 36s | Max:  5m 28s | Hits:  96%/5743  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 01m | Avg: 12m 19s | Max: 25m 34s | Hits:  84%/13988 
      🟩 12.5               Pass: 100%/2   | Total: 20m 51s | Avg: 10m 25s | Max: 12m 21s | Hits:  94%/5688  
      🟩 12.8               Pass: 100%/36  | Total:  5h 59m | Avg:  9m 59s | Max: 30m 20s | Hits:  87%/85320 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 42m 22s | Avg: 21m 11s | Max: 22m 02s | Hits:  27%/5704  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 01m | Avg: 12m 19s | Max: 25m 34s | Hits:  84%/13988 
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 51s | Avg: 10m 25s | Max: 12m 21s | Hits:  94%/5688  
      🟩 nvcc12.8           Pass: 100%/34  | Total:  5h 17m | Avg:  9m 19s | Max: 30m 20s | Hits:  91%/79616 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 42m 22s | Avg: 21m 11s | Max: 22m 02s | Hits:  27%/5704  
      🟩 nvcc               Pass: 100%/41  | Total:  6h 39m | Avg:  9m 44s | Max: 30m 20s | Hits:  90%/99292 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 28m 08s | Avg:  7m 02s | Max:  8m 55s | Hits:  93%/11376 
      🟩 Clang15            Pass: 100%/2   | Total: 10m 42s | Avg:  5m 21s | Max:  6m 12s | Hits:  96%/5700  
      🟩 Clang16            Pass: 100%/2   | Total: 10m 53s | Avg:  5m 26s | Max:  6m 28s | Hits:  96%/5700  
      🟩 Clang17            Pass: 100%/2   | Total:  9m 04s | Avg:  4m 32s | Max:  4m 41s | Hits:  99%/5700  
      🟩 Clang18            Pass: 100%/6   | Total:  1h 06m | Avg: 11m 01s | Max: 22m 02s | Hits:  69%/14275 
      🟩 GCC7               Pass: 100%/2   | Total: 27m 51s | Avg: 13m 55s | Max: 21m 21s | Hits:  61%/5638  
      🟩 GCC8               Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s | Hits:  99%/2829  
      🟩 GCC9               Pass: 100%/2   | Total: 28m 45s | Avg: 14m 22s | Max: 25m 09s | Hits:  65%/5650  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  5m 59s | Hits:  96%/5706  
      🟩 GCC11              Pass: 100%/2   | Total:  8m 19s | Avg:  4m 09s | Max:  4m 12s | Hits:  99%/5702  
      🟩 GCC12              Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 15s | Hits:  93%/5702  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 35m | Avg:  9m 33s | Max: 22m 11s | Hits:  84%/14536 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 51m 47s | Avg: 25m 53s | Max: 26m 13s | Hits:  99%/5364  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 57m 16s | Avg: 28m 38s | Max: 30m 20s | Hits:  93%/5430  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 51s | Avg: 10m 25s | Max: 12m 21s | Hits:  94%/5688  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  2h 04m | Avg:  7m 48s | Max: 22m 02s | Hits:  86%/42751 
      🟩 GCC                Pass: 100%/21  | Total:  3h 07m | Avg:  8m 54s | Max: 25m 09s | Hits:  84%/45763 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 49m | Avg: 27m 15s | Max: 30m 20s | Hits:  96%/10794 
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 51s | Avg: 10m 25s | Max: 12m 21s | Hits:  94%/5688  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 10s | Avg:  8m 05s | Max: 12m 02s | Hits:  99%/2961  
      🟩 rtx2080            Pass: 100%/41  | Total:  7h 05m | Avg: 10m 23s | Max: 30m 20s | Hits:  87%/102035
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  6h 18m | Avg: 10m 13s | Max: 30m 20s | Hits:  87%/104956
      🟩 NVRTC              Pass: 100%/2   | Total: 31m 18s | Avg: 15m 39s | Max: 15m 45s | Hits:  90%/40    
      🟩 Test               Pass: 100%/3   | Total: 30m 07s | Avg: 10m 02s | Max: 12m 02s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 14s | Avg:  2m 14s | Max:  2m 14s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 31m 18s | Avg: 15m 39s | Max: 15m 45s | Hits:  90%/40    
      🟩 90                 Pass: 100%/2   | Total: 16m 10s | Avg:  8m 05s | Max: 12m 02s | Hits:  99%/2961  
      🟩 90;90a;100         Pass: 100%/1   | Total:  4m 55s | Avg:  4m 55s | Max:  4m 55s | Hits:  99%/2961  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  4h 16m | Avg: 12m 12s | Max: 26m 56s | Hits:  83%/56127 
      🟩 20                 Pass: 100%/21  | Total:  3h 03m | Avg:  8m 44s | Max: 30m 20s | Hits:  92%/48869 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 02m | Avg: 5m 35s | Max: 14m 17s | Hits: 97%/11722

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 51m | Avg:  6m 11s | Max: 14m 17s | Hits:  97%/9406  
      🟩 arm64              Pass: 100%/4   | Total: 11m 32s | Avg:  2m 53s | Max:  2m 56s | Hits:  99%/2316  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s | Hits:  59%/277   
      🟩 12.5               Pass: 100%/2   | Total: 10m 34s | Avg:  5m 17s | Max:  5m 19s | Hits:  96%/742   
      🟩 12.8               Pass: 100%/19  | Total:  1h 39m | Avg:  5m 14s | Max: 14m 17s | Hits:  98%/10703 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s | Hits:  59%/277   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 34s | Avg:  5m 17s | Max:  5m 19s | Hits:  96%/742   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 39m | Avg:  5m 14s | Max: 14m 17s | Hits:  98%/10703 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 02m | Avg:  5m 35s | Max: 14m 17s | Hits:  97%/11722 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s | Hits: 100%/581   
      🟩 Clang15            Pass: 100%/1   | Total:  3m 26s | Avg:  3m 26s | Max:  3m 26s | Hits: 100%/579   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 37s | Avg:  3m 37s | Max:  3m 37s | Hits: 100%/579   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s | Hits: 100%/579   
      🟩 Clang18            Pass: 100%/4   | Total: 21m 21s | Avg:  5m 20s | Max: 12m 03s | Hits: 100%/2316  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s | Hits:  99%/581   
      🟩 GCC11              Pass: 100%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s | Hits:  99%/579   
      🟩 GCC12              Pass: 100%/2   | Total: 16m 01s | Avg:  8m 00s | Max: 12m 28s | Hits:  99%/1158  
      🟩 GCC13              Pass: 100%/6   | Total: 29m 17s | Avg:  4m 52s | Max: 14m 17s | Hits:  99%/3474  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s | Hits:  59%/277   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 12m 15s | Avg: 12m 15s | Max: 12m 15s | Hits:  59%/277   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 34s | Avg:  5m 17s | Max:  5m 19s | Hits:  96%/742   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 35m 15s | Avg:  4m 24s | Max: 12m 03s | Hits: 100%/4634  
      🟩 GCC                Pass: 100%/10  | Total: 52m 07s | Avg:  5m 12s | Max: 14m 17s | Hits:  99%/5792  
      🟩 MSVC               Pass: 100%/2   | Total: 24m 59s | Avg: 12m 29s | Max: 12m 44s | Hits:  59%/554   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 34s | Avg:  5m 17s | Max:  5m 19s | Hits:  96%/742   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 22s | Avg:  8m 41s | Max: 14m 17s | Hits:  99%/1158  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 45m | Avg:  5m 16s | Max: 12m 44s | Hits:  97%/10564 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 24m | Avg:  4m 25s | Max: 12m 44s | Hits:  97%/9985  
      🟩 Test               Pass: 100%/3   | Total: 38m 48s | Avg: 12m 56s | Max: 14m 17s | Hits:  99%/1737  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 20m 21s | Avg:  6m 47s | Max: 14m 17s | Hits:  99%/1737  
      🟩 90a                Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s | Hits:  99%/579   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 58s | Avg:  3m 29s | Max:  5m 19s | Hits:  99%/2108  
      🟩 20                 Pass: 100%/18  | Total:  1h 48m | Avg:  6m 03s | Max: 14m 17s | Hits:  97%/9614  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 43s | Avg: 7m 51s | Max: 13m 20s | Hits: 98%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max: 13m 20s | Hits:  98%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max: 13m 20s | Hits:  98%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max: 13m 20s | Hits:  98%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max: 13m 20s | Hits:  98%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max: 13m 20s | Hits:  98%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max: 13m 20s | Hits:  98%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 43s | Avg:  7m 51s | Max: 13m 20s | Hits:  98%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 23s | Avg:  2m 23s | Max:  2m 23s | Hits:  98%/154   
      🟩 Test               Pass: 100%/1   | Total: 13m 20s | Avg: 13m 20s | Max: 13m 20s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 01m | Avg: 1h 01m | Max: 1h 01m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

Comment on lines +135 to +136
const int begin_bit = GENERATE_COPY(take(2, random(0, key_size - 1)));
const int end_bit = GENERATE_COPY(take(2, random(begin_bit + 1, key_size)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elstehle could you please have a quick look whether this fix is correct? We previously tested begin_bit == end_bit sometimes. Was this an invalid scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if begin_bit == end_bit is valid then the kernel should not be called (I guess)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think begin_bit == end_bit is generally valid, it's similar to num_items==0.

[...] then the kernel should not be called (I guess)

We can skip any kernel invocation only if the user invoked DeviceRardixSort via the DoubleBuffer interface. Otherwise the user will expect the output to end up in d_{keys,values}_out, in which case we need to copy the "sorted" output there.

@fbusato
Copy link
Contributor Author

fbusato commented Mar 7, 2025

added performance comparison in the description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 Targeted for 3.0 release
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

[FEA]: Deprecate/Replace cub::BFE
4 participants