Skip to content

Commit 8d0ba06

Browse files
authored
Merge pull request #258 from JuliaGPU/jps/queue-priority
queue: Allow setting priority from ctor
2 parents 9e1041b + 3c48999 commit 8d0ba06

File tree

4 files changed

+46
-11
lines changed

4 files changed

+46
-11
lines changed

docs/src/queues_signals.md

+19-9
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22

33
Similar to CUDA streams, ROCm has the concept of queues, which are
44
buffers used to instruct the GPU hardware which kernels to launch. ROCm queues
5-
are asynchronous, unlike CUDA streams. Each agent has a default queue
6-
associated, which is accessible with `get_default_queue(agent)` (or
7-
`get_default_queue()` for the default agent's default queue). You can specify
5+
are asynchronous, unlike CUDA streams. Each device has a default queue
6+
associated, which is accessible with `default_queue(device)` (or
7+
`default_queue()` for the default device's default queue). You can specify
88
which queue to launch a kernel on with the `queue` argument to `@roc`:
99

1010
```julia
11-
q = AMDGPU.ROCQueue(agent)
11+
q = AMDGPU.ROCQueue(device)
1212
@roc queue=q kernel(...)
1313
```
1414

@@ -18,19 +18,29 @@ which can be inspected to determine how many (and which) kernels are executing
1818
by comparing the signals returned from `@roc`. You can also omit the `queue`
1919
argument, which will then check the default queue.
2020

21-
If a kernel ever gets "stuck" and locks up the GPU (noticeable with 100% GPU
22-
usage in `rocm-smi`), you can kill the kernel and all other kernels in the
21+
Sometimes a kernel ever gets "stuck" and locks up the GPU (noticeable with 100%
22+
GPU usage in `rocm-smi`); you can kill the kernel and all other kernels in the
2323
queue with `kill_queue!(queue)`. This can be "safely" done to the default
2424
queue, since default queues are recreated as-needed.
2525

26+
Queues also have an inherent priority, which allows control of kernel
27+
submission latency and on-device scheduling preference with respect to kernels
28+
submitted on other queues. There are three priorities: normal (the default), low, and high priority. These can be easily set at queue creation time:
29+
30+
```julia
31+
low_prio_queue = ROCQueue(device; priority=:low)
32+
high_prio_queue = ROCQueue(device; priority=:high)
33+
normal_prio_queue = ROCQueue(device; priority=:normal) # or just omit "priority"
34+
```
35+
2636
# Signals
2737

2838
Unlike CUDA, ROCm kernels are tracked by an associated signal, which is
2939
created and returned by `@roc`, and is `wait`ed on to track kernel completion.
3040
Signals may also be used for manual synchronization (since they work for CPUs
3141
and GPUs equally well). CPU usage is done with the `HSA.signal_*` functions,
32-
and GPU usage is done with the `device_signal_*` functions. For most signalling
33-
needs, consider using a hostcall instead.
42+
and GPU usage is done with the `device_signal_*` and `hostcall_device_signal_*`
43+
functions. For most signalling needs, consider using a hostcall instead.
3444

3545
If custom signal handling is desired, signals can be manually constructed and
3646
passed to `@roc`:
@@ -39,7 +49,7 @@ passed to `@roc`:
3949
# A kernel which waits on all signals in `sigs`
4050
function multi_wait(sigs)
4151
for i in 1:length(sigs)
42-
AMDGPU.device_signal_wait(sigs[i], 0)
52+
AMDGPU.Device.hostcall_device_signal_wait(sigs[i], 0)
4353
end
4454
nothing
4555
end

src/queue.jl

+16-2
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ function queue_error_handler(status::HSA.Status, _queue::Ptr{HSA.Queue}, queue_o
2626
return nothing
2727
end
2828

29-
function ROCQueue(device::ROCDevice)
29+
function ROCQueue(device::ROCDevice; priority::Symbol=:normal)
3030
queue_size = Ref{UInt32}(0)
3131
getinfo(device.agent, HSA.AGENT_INFO_QUEUE_MAX_SIZE, queue_size) |> check
3232
@assert queue_size[] > 0
@@ -49,7 +49,6 @@ function ROCQueue(device::ROCDevice)
4949
end
5050

5151
# Monitor queue for async errors
52-
# TODO: errormonitor
5352
queue_ptr = queue.queue
5453
errormonitor(Threads.@spawn begin
5554
try
@@ -72,6 +71,21 @@ function ROCQueue(device::ROCDevice)
7271
end
7372
end)
7473

74+
# Set queue priority
75+
if !in(priority, (:normal, :low, :high))
76+
throw(ArgumentError("Invalid queue priority: $priority\nOptions are :low, :normal, :high"))
77+
end
78+
if priority != :normal
79+
hsa_prio = if priority == :normal
80+
HSA.AMD_QUEUE_PRIORITY_NORMAL
81+
elseif priority == :low
82+
HSA.AMD_QUEUE_PRIORITY_LOW
83+
elseif priority == :high
84+
HSA.AMD_QUEUE_PRIORITY_HIGH
85+
end
86+
HSA.amd_queue_set_priority(queue_ptr, hsa_prio) |> check
87+
end
88+
7589
AMDGPU.hsaref!()
7690
finalizer(queue) do queue
7791
kill_queue!(queue)

test/hsa/queue.jl

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
@testset "Queues" begin
2+
@testset "Priorities" begin
3+
device = AMDGPU.default_device()
4+
# Test that priorities can be set
5+
for priority in (:low, :normal, :high)
6+
ROCQueue(device; priority)
7+
end
8+
@test_throws ArgumentError ROCQueue(device; priority=:fake)
9+
end
10+
end

test/runtests.jl

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ end
5656
@info "Testing using device $(AMDGPU.default_device())"
5757

5858
include("hsa/device.jl")
59+
include("hsa/queue.jl")
5960
include("hsa/memory.jl")
6061
end
6162
@testset "Codegen" begin

0 commit comments

Comments
 (0)