Releases: JuliaGPU/AMDGPU.jl
Releases · JuliaGPU/AMDGPU.jl
v0.5.1
AMDGPU v0.5.1
Closed issues:
- Implement Neural Network primitives (#11)
- [Mark/Wait] Use HIP events to do fine-grained sync (#127)
- Implement memory reclaim mechanism similar to CUDA's (#134)
- NNlibAMDGPU.jl ? (#143)
- Deprecation warning
unsafe_length()
(#183) - Test suite failures due to segfaults on Julia 1.8 (#261)
- HSA memory region query test fail (#275)
- ROCBlas support for gfx1031, 1032, and 1033 (#314)
Merged pull requests:
v0.5.0
AMDGPU v0.5.0
Closed issues:
- Test failures locally on 1.9.0-beta4 -- Radeon 6800XT (#400)
- Update HIP errors codes (#404)
- Optimize
wait!
for HSA kernel launches (#405) - rocBLAS synchronization issue? (#418)
- First install with
JULIA_AMDGPU_DISABLE_ARTIFACTS
leads to broken config (#424) - Cannot
unsafe_wrap
a device array iflock=false
(#436)
Merged pull requests:
- Use HIP as kernel backend instead of HSA (#423) (@pxl-th)
- fix(docs): Wrong symbol in
functional
docs (#431) (@kunzaatko) - Update to GPUCompiler 0.21 & LLVM 6 (#437) (@pxl-th)
- Fix docs for HIP (#439) (@luraess)
- Run tests on multiple workers again (#441) (@pxl-th)
- Specialize ROCArray on buffer type (#442) (@pxl-th)
v0.4.15
AMDGPU v0.4.15
Merged pull requests:
v0.4.14
AMDGPU v0.4.14
Closed issues:
- Switching to device ≠ 1 hangs on multi-GPU node (#425)
- @ROCDynamicLocalArray: add support for dynamic eltype and expressions for dims (#428)
Merged pull requests:
- Fix host synchronization (#417) (@pxl-th)
- Add device selection in current task by ID (#420) (@luraess)
- Declare compatibility with
LLVM_jll
15 (#426) (@giordano) - Remove buggy uses of default_device (#427) (@jpsamaroo)
- at-ROC*LocalArray: Escape arguments (#430) (@jpsamaroo)
v0.4.13
AMDGPU v0.4.13
Merged pull requests:
v0.4.12
v0.4.11
v0.4.10
AMDGPU v0.4.10
Merged pull requests:
v0.4.9
AMDGPU v0.4.9
Closed issues:
- State of queues and streams (#337)
- rocBLAS: Remove old hand-wrapped code (#384)
- HSA memory fault upon switching from default device on multi-GPU node (#385)
- Test fail locally with
AssertionError: AMDGPU.Runtime.LOGGING_STATIC_ENABLED
(#399)
Merged pull requests:
- Switch to task-focused synchronization model (#374) (@jpsamaroo)
- Use broadcast instead of copies to initialize mapreduce buffers. (#390) (@maleadt)
- tests: Skip logging tests if disabled (#391) (@jpsamaroo)
- Add blas wrappers for triangular matrix mul / div (#392) (@pxl-th)
- Simplify signal pooling (#393) (@pxl-th)
- Adapt to GPUCompiler 0.18 (#394) (@pxl-th)
- Reduce memory usage (#395) (@pxl-th)
- Add support for KernelAbstraction 0.9 (#398) (@vchuravy)
- Update to GPUCompiler 0.19 & LLVM 5 (#407) (@pxl-th)
- Fix compiler timespan logging (#408) (@pxl-th)
- rocBLAS: define highlevel dot, gemm, axpy functions for FP16 (#409) (@pxl-th)
- Add KernelAbstractions.jl unsafe_free! (#410) (@pxl-th)
v0.4.8
AMDGPU v0.4.8
Merged pull requests:
- ROCSignal: Pool signals in ctor (#369) (@jpsamaroo)
- Reduce allocations (#376) (@pxl-th)
- Report and exit on memory fault (#379) (@jpsamaroo)
- versioninfo: Indicate if using JLLs or System (#381) (@jpsamaroo)
- ROCSignal: Disable IPC by default (#383) (@jpsamaroo)