v0.4.5
AMDGPU v0.4.5
Closed issues:
- Mem.alloc: Allow using hipMalloc to service allocations (#286)
- rocBLAS GEMM ignores
@view
(#319) - sincospi intrinsic is broken (#334)
#jps/dev
segfaults on MI250x (#340)- method ambiguity in
rand!
(#343) - Add function or macro for AMDGPU.jl equivalent to CUDA.CuDynamicSharedArray (and CUDA.CuStaticSharedArray) (#347)
- Free
KernelState
in finalizer (#352) --check-bounds=no
is broken on Julia 1.9.0-beta3 (#354)
Merged pull requests:
- Fix GEMM (regular & batched) and support batched GEMM for 3D array (#318) (@pxl-th)
- Add MIOpen (#320) (@pxl-th)
- Add support for 2D * 3D batched GEMM (#321) (@pxl-th)
- Support NNlib batched gemm format (#322) (@pxl-th)
- Add pointer() method for ROCArray and some library tests (#323) (@torrance)
- Fix double unsafe_free calls (#324) (@jpsamaroo)
- Mem: Allow using hipMalloc/hipFree for allocations (#325) (@jpsamaroo)
- Cast to Ptr before checking NULL pointer (#328) (@torrance)
- Resize! support (#333) (@matinraayai)
- Add sincos/sincospi/frexp/ldexp intrinsics (#336) (@jpsamaroo)
- Add local memory allocation helpers (#348) (@jpsamaroo)
- Add GPUCompiler 0.17 to compat (#349) (@jpsamaroo)
- Preserve
UInt32
in indexing intrinsics (#351) (@pxl-th) - Fix
unsafe_free!
not actually freeing (#353) (@jpsamaroo) - Don't sync on default HIP stream every time (#356) (@pxl-th)
- Make alignment generated (#358) (@pxl-th)
- tests: Properly unwrap Distributed exceptions (#359) (@jpsamaroo)