Releases · JuliaGPU/AMDGPU.jl

Mem.alloc: Allow using hipMalloc to service allocations (#286)
rocBLAS GEMM ignores @view (#319)
sincospi intrinsic is broken (#334)
#jps/dev segfaults on MI250x (#340)
method ambiguity in rand! (#343)
Add function or macro for AMDGPU.jl equivalent to CUDA.CuDynamicSharedArray (and CUDA.CuStaticSharedArray) (#347)
Free KernelState in finalizer (#352)
--check-bounds=no is broken on Julia 1.9.0-beta3 (#354)

Merged pull requests:

Fix GEMM (regular & batched) and support batched GEMM for 3D array (#318) (@pxl-th)
Add MIOpen (#320) (@pxl-th)
Add support for 2D * 3D batched GEMM (#321) (@pxl-th)
Support NNlib batched gemm format (#322) (@pxl-th)
Add pointer() method for ROCArray and some library tests (#323) (@torrance)
Fix double unsafe_free calls (#324) (@jpsamaroo)
Mem: Allow using hipMalloc/hipFree for allocations (#325) (@jpsamaroo)
Cast to Ptr before checking NULL pointer (#328) (@torrance)
Resize! support (#333) (@matinraayai)
Add sincos/sincospi/frexp/ldexp intrinsics (#336) (@jpsamaroo)
Add local memory allocation helpers (#348) (@jpsamaroo)
Add GPUCompiler 0.17 to compat (#349) (@jpsamaroo)
Preserve UInt32 in indexing intrinsics (#351) (@pxl-th)
Fix unsafe_free! not actually freeing (#353) (@jpsamaroo)
Don't sync on default HIP stream every time (#356) (@pxl-th)
Make alignment generated (#358) (@pxl-th)
tests: Properly unwrap Distributed exceptions (#359) (@jpsamaroo)

Contributors

torrance, jpsamaroo, and 2 other contributors

Assets 2

26 Oct 15:29

github-actions

v0.4.4

f92a5f5

v0.4.4

AMDGPU v0.4.4

Diff since v0.4.3

Closed issues:

Repetetive AMDGPU.ones calls crash runtime (#299)
Add AMDGPU.jl equivalent to CUDA.CuDynamicSharedArray (and CUDA.CuStaticSharedArray) (#304)
Segfault with basic kernel from AMDGPU.jl doc on LUMI (#308)
ROC kernel faulting upon having AMDGPU and CUDA loaded (#312)
AMDGPU.rand failing to create a ROCArray (#315)

Merged pull requests:

Remove waiter and error monitor threads (#306) (@pxl-th)
Update bindeps search path (#307) (@luraess)
Prioritise ENV var to use or not artifacts (#310) (@luraess)
Add dynamic local memory support (#311) (@jpsamaroo)
random: Load definitions without rocRAND (#316) (@jpsamaroo)

Contributors

jpsamaroo, pxl-th, and luraess

Assets 2

03 Oct 19:25

github-actions

v0.4.3

696f840

v0.4.3

AMDGPU v0.4.3

Diff since v0.4.2

Closed issues:

Queue selection test fail (#274)

Merged pull requests:

Add device quirks from CUDA.jl, enhance at-rocprintf (#269) (@jpsamaroo)
Use an optimized norm function for ROCBLASArray (#282) (@amontoison)
Add rocBLAS_jll and rocSPARSE_jll deps (#284) (@jpsamaroo)
active_kernels: Use WeakKeyDict (#285) (@jpsamaroo)
CI: Add gfx90a to more jobs (#289) (@jpsamaroo)
build: Remove build step, run at toplevel (#290) (@jpsamaroo)
Mapreducedim support for AnyROCArray (#291) (@matinraayai)
Parallelize tests (#293) (@jpsamaroo)
Fix precompilation (#294) (@pxl-th)
Do not rethrow EOF (#296) (@pxl-th)
Use correct queue for kernels (#297) (@pxl-th)
Implement kernel hashing system (#302) (@jpsamaroo)

Contributors

jpsamaroo, pxl-th, and 2 other contributors

Assets 2

08 Sep 16:22

github-actions

v0.4.2

287712e

v0.4.2

AMDGPU v0.4.2

Diff since v0.4.1

Closed issues:

build failure on Julia 1.8.1 (#278)

Merged pull requests:

Mem: Retry failing allocations (#251) (@jpsamaroo)
Add device-to-device unsafe_copy3d test (#260) (@luraess)
Fix allocation retry mechanism, add slow allocation fallback (#262) (@jpsamaroo)
Run wavefront tests with detected wavefrontsize (#264) (@torrance)
During HostCall, ensure device has finished using buffers before freeing (#266) (@torrance)
Expand fft tests (#267) (@torrance)
Remove code that duplicates AbstractFFTs; add tests for casting (#268) (@torrance)
Don't embed the method table in the AST (#276) (@jpsamaroo)
deps: Don't access is_available unless using succeeds (#279) (@jpsamaroo)
device: Add ROCDevice() ctor (#280) (@jpsamaroo)

Contributors

torrance, jpsamaroo, and luraess

Assets 2

01 Aug 16:23

github-actions

v0.4.1

5f723f8

v0.4.1

AMDGPU v0.4.1

Diff since v0.4.0

Closed issues:

Add option to disable automatic mark/wait of specific arrays (#126)
Limit multi-dimensional groupsize properly (#150)
Optimize kernarg allocations in kernel construction (#247)
Add priority kwarg to ROCQueue ctor (#256)

Merged pull requests:

Add BackToCPU struct to reduce 'view' allocations (#246) (@pxl-th)
LB GPUCompiler to 0.16.2 (#248) (@jpsamaroo)
Optimize kernel setup and launch (#249) (@jpsamaroo)
launch: Fix groupsize dimension check (#250) (@jpsamaroo)
device: Add device_id method (#253) (@jpsamaroo)
Re-export indexing intrinsics (#254) (@jpsamaroo)
CI: Switch GHA to 1.7 release (#257) (@jpsamaroo)
queue: Allow setting priority from ctor (#258) (@jpsamaroo)
math: Make signbit return Bool (#259) (@jpsamaroo)

Contributors

jpsamaroo and pxl-th

Assets 2

24 Jul 00:35

github-actions

v0.4.0

e00143e

v0.4.0

AMDGPU v0.4.0

Diff since v0.3.7

Closed issues:

Use atomics/locking in hostcall (#4)
Bump Setfield compat (#242)

Merged pull requests:

Remove launch export (#232) (@matinraayai)
Remove indirection layer, use modules (#240) (@jpsamaroo)
Update Setfield compat (#243) (@luraess)

Contributors

jpsamaroo, matinraayai, and luraess

Assets 2

11 May 17:56

github-actions

v0.3.7

d29e3d4

v0.3.7

AMDGPU v0.3.7

Diff since v0.3.6

Closed issues:

Move standard warnings of using AMDGPU on system without AMD GPU to @debug? (#199)

Merged pull requests:

build/init: Skip if GPUs are not available (#231) (@jpsamaroo)

Contributors

debug and jpsamaroo

Assets 2

09 May 16:07

github-actions

v0.3.6

f37a908

v0.3.6

AMDGPU v0.3.6

Diff since v0.3.5

Merged pull requests:

chore(ci): add informational Codecov status checks (#227) (@thomasrockhu-codecov)
Fix incorrect agent target for allocations (#228) (@jpsamaroo)
Properly skip unsupported OS/arch configs (#229) (@jpsamaroo)

Contributors

jpsamaroo and thomasrockhu-codecov

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDGPU v0.4.7

Contributors

AMDGPU v0.4.6

Contributors

AMDGPU v0.4.5

Contributors

AMDGPU v0.4.4

Contributors

AMDGPU v0.4.3

Contributors

AMDGPU v0.4.2

Contributors

AMDGPU v0.4.1

Contributors

AMDGPU v0.4.0

Contributors

AMDGPU v0.3.7

Contributors

AMDGPU v0.3.6

Contributors

Releases: JuliaGPU/AMDGPU.jl

v0.4.7

AMDGPU v0.4.7

Contributors

v0.4.6

AMDGPU v0.4.6

Contributors

v0.4.5

AMDGPU v0.4.5

Contributors

v0.4.4

AMDGPU v0.4.4

Contributors

v0.4.3

AMDGPU v0.4.3

Contributors

v0.4.2

AMDGPU v0.4.2

Contributors

v0.4.1

AMDGPU v0.4.1

Contributors

v0.4.0

AMDGPU v0.4.0

Contributors

v0.3.7

AMDGPU v0.3.7

Contributors

v0.3.6

AMDGPU v0.3.6

Contributors