v0.4.1
AMDGPU v0.4.1
Closed issues:
- Add option to disable automatic mark/wait of specific arrays (#126)
- Limit multi-dimensional groupsize properly (#150)
- Optimize kernarg allocations in kernel construction (#247)
- Add priority kwarg to ROCQueue ctor (#256)
Merged pull requests:
- Add BackToCPU struct to reduce 'view' allocations (#246) (@pxl-th)
- LB GPUCompiler to 0.16.2 (#248) (@jpsamaroo)
- Optimize kernel setup and launch (#249) (@jpsamaroo)
- launch: Fix groupsize dimension check (#250) (@jpsamaroo)
- device: Add device_id method (#253) (@jpsamaroo)
- Re-export indexing intrinsics (#254) (@jpsamaroo)
- CI: Switch GHA to 1.7 release (#257) (@jpsamaroo)
- queue: Allow setting priority from ctor (#258) (@jpsamaroo)
- math: Make signbit return Bool (#259) (@jpsamaroo)