2
2
3
3
Similar to CUDA streams, ROCm has the concept of queues, which are
4
4
buffers used to instruct the GPU hardware which kernels to launch. ROCm queues
5
- are asynchronous, unlike CUDA streams. Each agent has a default queue
6
- associated, which is accessible with ` get_default_queue(agent )` (or
7
- ` get_default_queue ()` for the default agent 's default queue). You can specify
5
+ are asynchronous, unlike CUDA streams. Each device has a default queue
6
+ associated, which is accessible with ` default_queue(device )` (or
7
+ ` default_queue ()` for the default device 's default queue). You can specify
8
8
which queue to launch a kernel on with the ` queue ` argument to ` @roc ` :
9
9
10
10
``` julia
11
- q = AMDGPU. ROCQueue (agent )
11
+ q = AMDGPU. ROCQueue (device )
12
12
@roc queue= q kernel (... )
13
13
```
14
14
@@ -18,19 +18,29 @@ which can be inspected to determine how many (and which) kernels are executing
18
18
by comparing the signals returned from ` @roc ` . You can also omit the ` queue `
19
19
argument, which will then check the default queue.
20
20
21
- If a kernel ever gets "stuck" and locks up the GPU (noticeable with 100% GPU
22
- usage in ` rocm-smi ` ), you can kill the kernel and all other kernels in the
21
+ Sometimes a kernel ever gets "stuck" and locks up the GPU (noticeable with 100%
22
+ GPU usage in ` rocm-smi ` ); you can kill the kernel and all other kernels in the
23
23
queue with ` kill_queue!(queue) ` . This can be "safely" done to the default
24
24
queue, since default queues are recreated as-needed.
25
25
26
+ Queues also have an inherent priority, which allows control of kernel
27
+ submission latency and on-device scheduling preference with respect to kernels
28
+ submitted on other queues. There are three priorities: normal (the default), low, and high priority. These can be easily set at queue creation time:
29
+
30
+ ``` julia
31
+ low_prio_queue = ROCQueue (device; priority= :low )
32
+ high_prio_queue = ROCQueue (device; priority= :high )
33
+ normal_prio_queue = ROCQueue (device; priority= :normal ) # or just omit "priority"
34
+ ```
35
+
26
36
# Signals
27
37
28
38
Unlike CUDA, ROCm kernels are tracked by an associated signal, which is
29
39
created and returned by ` @roc ` , and is ` wait ` ed on to track kernel completion.
30
40
Signals may also be used for manual synchronization (since they work for CPUs
31
41
and GPUs equally well). CPU usage is done with the ` HSA.signal_* ` functions,
32
- and GPU usage is done with the ` device_signal_* ` functions. For most signalling
33
- needs, consider using a hostcall instead.
42
+ and GPU usage is done with the ` device_signal_* ` and ` hostcall_device_signal_* `
43
+ functions. For most signalling needs, consider using a hostcall instead.
34
44
35
45
If custom signal handling is desired, signals can be manually constructed and
36
46
passed to ` @roc ` :
@@ -39,7 +49,7 @@ passed to `@roc`:
39
49
# A kernel which waits on all signals in `sigs`
40
50
function multi_wait (sigs)
41
51
for i in 1 : length (sigs)
42
- AMDGPU. device_signal_wait (sigs[i], 0 )
52
+ AMDGPU. Device . hostcall_device_signal_wait (sigs[i], 0 )
43
53
end
44
54
nothing
45
55
end
0 commit comments