Releases: JuliaGPU/AMDGPU.jl
Releases · JuliaGPU/AMDGPU.jl
v0.4.7
v0.4.6
AMDGPU v0.4.6
Closed issues:
- Implement occupancy API (#271)
getinfo
should determine theRef
output container automatically (#273)
Merged pull requests:
- Add timespan logging via TimespanLogging.jl (#263) (@jpsamaroo)
- Add occupancy API and groupsize tuning (#326) (@jpsamaroo)
- Reduce signal wait allocations (#361) (@jpsamaroo)
- Add more intrinsics, enable
always_inline
(#362) (@jpsamaroo) - Simplify math intrinsics (#363) (@pxl-th)
- Implement unified getinfo interface (#364) (@jpsamaroo)
- Assorted fixes (#365) (@jpsamaroo)
- Add memory allocation limiters (#366) (@jpsamaroo)
- Specify return types for getinfo calls (#368) (@pxl-th)
v0.4.5
AMDGPU v0.4.5
Closed issues:
- Mem.alloc: Allow using hipMalloc to service allocations (#286)
- rocBLAS GEMM ignores
@view
(#319) - sincospi intrinsic is broken (#334)
#jps/dev
segfaults on MI250x (#340)- method ambiguity in
rand!
(#343) - Add function or macro for AMDGPU.jl equivalent to CUDA.CuDynamicSharedArray (and CUDA.CuStaticSharedArray) (#347)
- Free
KernelState
in finalizer (#352) --check-bounds=no
is broken on Julia 1.9.0-beta3 (#354)
Merged pull requests:
- Fix GEMM (regular & batched) and support batched GEMM for 3D array (#318) (@pxl-th)
- Add MIOpen (#320) (@pxl-th)
- Add support for 2D * 3D batched GEMM (#321) (@pxl-th)
- Support NNlib batched gemm format (#322) (@pxl-th)
- Add pointer() method for ROCArray and some library tests (#323) (@torrance)
- Fix double unsafe_free calls (#324) (@jpsamaroo)
- Mem: Allow using hipMalloc/hipFree for allocations (#325) (@jpsamaroo)
- Cast to Ptr before checking NULL pointer (#328) (@torrance)
- Resize! support (#333) (@matinraayai)
- Add sincos/sincospi/frexp/ldexp intrinsics (#336) (@jpsamaroo)
- Add local memory allocation helpers (#348) (@jpsamaroo)
- Add GPUCompiler 0.17 to compat (#349) (@jpsamaroo)
- Preserve
UInt32
in indexing intrinsics (#351) (@pxl-th) - Fix
unsafe_free!
not actually freeing (#353) (@jpsamaroo) - Don't sync on default HIP stream every time (#356) (@pxl-th)
- Make alignment generated (#358) (@pxl-th)
- tests: Properly unwrap Distributed exceptions (#359) (@jpsamaroo)
v0.4.4
AMDGPU v0.4.4
Closed issues:
- Repetetive
AMDGPU.ones
calls crash runtime (#299) - Add AMDGPU.jl equivalent to CUDA.CuDynamicSharedArray (and CUDA.CuStaticSharedArray) (#304)
- Segfault with basic kernel from AMDGPU.jl doc on LUMI (#308)
- ROC kernel faulting upon having AMDGPU and CUDA loaded (#312)
AMDGPU.rand
failing to create aROCArray
(#315)
Merged pull requests:
- Remove waiter and error monitor threads (#306) (@pxl-th)
- Update bindeps search path (#307) (@luraess)
- Prioritise ENV var to use or not artifacts (#310) (@luraess)
- Add dynamic local memory support (#311) (@jpsamaroo)
- random: Load definitions without rocRAND (#316) (@jpsamaroo)
v0.4.3
AMDGPU v0.4.3
Closed issues:
- Queue selection test fail (#274)
Merged pull requests:
- Add device quirks from CUDA.jl, enhance at-rocprintf (#269) (@jpsamaroo)
- Use an optimized norm function for ROCBLASArray (#282) (@amontoison)
- Add rocBLAS_jll and rocSPARSE_jll deps (#284) (@jpsamaroo)
- active_kernels: Use WeakKeyDict (#285) (@jpsamaroo)
- CI: Add gfx90a to more jobs (#289) (@jpsamaroo)
- build: Remove build step, run at toplevel (#290) (@jpsamaroo)
- Mapreducedim support for AnyROCArray (#291) (@matinraayai)
- Parallelize tests (#293) (@jpsamaroo)
- Fix precompilation (#294) (@pxl-th)
- Do not rethrow EOF (#296) (@pxl-th)
- Use correct queue for kernels (#297) (@pxl-th)
- Implement kernel hashing system (#302) (@jpsamaroo)
v0.4.2
AMDGPU v0.4.2
Closed issues:
- build failure on Julia 1.8.1 (#278)
Merged pull requests:
- Mem: Retry failing allocations (#251) (@jpsamaroo)
- Add device-to-device unsafe_copy3d test (#260) (@luraess)
- Fix allocation retry mechanism, add slow allocation fallback (#262) (@jpsamaroo)
- Run wavefront tests with detected wavefrontsize (#264) (@torrance)
- During HostCall, ensure device has finished using buffers before freeing (#266) (@torrance)
- Expand fft tests (#267) (@torrance)
- Remove code that duplicates AbstractFFTs; add tests for casting (#268) (@torrance)
- Don't embed the method table in the AST (#276) (@jpsamaroo)
- deps: Don't access is_available unless using succeeds (#279) (@jpsamaroo)
- device: Add ROCDevice() ctor (#280) (@jpsamaroo)
v0.4.1
AMDGPU v0.4.1
Closed issues:
- Add option to disable automatic mark/wait of specific arrays (#126)
- Limit multi-dimensional groupsize properly (#150)
- Optimize kernarg allocations in kernel construction (#247)
- Add priority kwarg to ROCQueue ctor (#256)
Merged pull requests:
- Add BackToCPU struct to reduce 'view' allocations (#246) (@pxl-th)
- LB GPUCompiler to 0.16.2 (#248) (@jpsamaroo)
- Optimize kernel setup and launch (#249) (@jpsamaroo)
- launch: Fix groupsize dimension check (#250) (@jpsamaroo)
- device: Add device_id method (#253) (@jpsamaroo)
- Re-export indexing intrinsics (#254) (@jpsamaroo)
- CI: Switch GHA to 1.7 release (#257) (@jpsamaroo)
- queue: Allow setting priority from ctor (#258) (@jpsamaroo)
- math: Make signbit return Bool (#259) (@jpsamaroo)
v0.4.0
AMDGPU v0.4.0
Closed issues:
Merged pull requests:
- Remove launch export (#232) (@matinraayai)
- Remove indirection layer, use modules (#240) (@jpsamaroo)
- Update Setfield compat (#243) (@luraess)
v0.3.7
AMDGPU v0.3.7
Closed issues:
Merged pull requests:
- build/init: Skip if GPUs are not available (#231) (@jpsamaroo)
v0.3.6
AMDGPU v0.3.6
Merged pull requests:
- chore(ci): add informational Codecov status checks (#227) (@thomasrockhu-codecov)
- Fix incorrect agent target for allocations (#228) (@jpsamaroo)
- Properly skip unsupported OS/arch configs (#229) (@jpsamaroo)