You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/markdown/hip_faq.md
+14-1
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,7 @@
33
33
-[Why _OpenMP is undefined when compiling with -fopenmp?](#why-_openmp-is-undefined-when-compiling-with--fopenmp)
34
34
-[Does the HIP-Clang compiler support extern shared declarations?](#does-the-hip-clang-compiler-support-extern-shared-declarations)
35
35
-[I have multiple HIP enabled devices and I am getting an error message hipErrorNoBinaryForGpu: Unable to find code object for all current devices?](#i-have-multiple-hip-enabled-devices-and-i-am-getting-an-error-message-hipErrorNoBinaryForGpu-unable-to-find-code-object-for-all-current-devices)
36
+
-[How to use per-thread default stream in HIP?](#how-to-use-per-thread-default-stream-in-hip)
36
37
-[How can I know the version of HIP?](#how-can-I-know-the-version-of-hip)
37
38
<!-- tocstop -->
38
39
@@ -94,7 +95,7 @@ However, we can provide a rough summary of the features included in each CUDA SD
94
95
- CUDA 6.5 :
95
96
-__shfl intriniscs (supported)
96
97
- CUDA 7.0 :
97
-
- Per-thread-streams (under development)
98
+
- Per-thread default streams (supported)
98
99
- C++11 (Hip-Clang supports all of C++11, all of C++14 and some C++17 features)
99
100
- CUDA 7.5 :
100
101
- float16 (supported)
@@ -260,6 +261,18 @@ If you have a precompiled application/library (like rocblas, tensorflow etc) whi
260
261
- The application/library does not ship code object bundles for *all* of your device(s): in this case you need to recompile the application/library yourself with correct `--offload-arch`.
261
262
- The application/library does not ship code object bundles for *some* of your device(s), for example you have a system with an APU + GPU and the library does not ship code objects for your APU. For this you can set the environment variable `HIP_VISIBLE_DEVICES` to only enable GPUs for which code object is available. This will limit the GPUs visible to your application and allow it to run.
262
263
264
+
### How to use per-thread default stream in HIP?
265
+
266
+
The per-thread default stream is an implicit stream local to both the thread and the current device. It does not do any implicit synchronization with other streams (like explicitly created streams), or default per-thread stream on other threads.
267
+
268
+
The per-thread default stream is a blocking stream and will synchronize with the default null stream if both are used in a program.
269
+
270
+
In ROCm, a compilation option should be added in order to compile the translation unit with per-thread default stream enabled.
271
+
“-fgpu-default-stream=per-thread”.
272
+
Once source is compiled with per-thread default stream enabled, all APIs will be executed on per thread default stream, hence there will not be any implicit synchronization with other streams.
273
+
274
+
Besides, per-thread default stream be enabled per translation unit, users can compile some files with feature enabled and some with feature disabled. Feature enabled translation unit will have default stream as per thread and there will not be any implicit synchronization done but other modules will have legacy default stream which will do implicit synchronization.
275
+
263
276
### How can I know the version of HIP?
264
277
265
278
HIP version definition has been updated since ROCm 4.2 release as the following:
Copy file name to clipboardexpand all lines: docs/markdown/hip_programming_guide.md
+9
Original file line number
Diff line number
Diff line change
@@ -139,6 +139,15 @@ This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallo
139
139
140
140
The test codes in the link (https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp) show how to implement application using malloc and free functions in device kernels.
141
141
142
+
## Use of Per-thread default stream
143
+
144
+
The per-thread default stream is supported in HIP. It is an implicit stream local to both the thread and the current device. This means that the command issued to the per-thread default stream by the thread does not implicitly synchronize with other streams (like explicitly created streams), or default per-thread stream on other threads.
145
+
The per-thread default stream is a blocking stream and will synchronize with the default null stream if both are used in a program.
146
+
The per-thread default stream can be enabled via adding a compilation option,
147
+
“-fgpu-default-stream=per-thread”.
148
+
149
+
And users can explicitly use "hipStreamPerThread" as per-thread default stream handle as input in API commands. There are test codes as examples in the link (https://github.com/ROCm-Developer-Tools/HIP/tree/develop/tests/catch/unit/streamperthread).
150
+
142
151
## Use of Long Double Type
143
152
144
153
In HIP-Clang, long double type is 80-bit extended precision format for x86_64, which is not supported by AMDGPU. HIP-Clang treats long double type as IEEE double type for AMDGPU. Using long double type in HIP source code will not cause issue as long as data of long double type is not transferred between host and device. However, long double type should not be used as kernel argument type.
0 commit comments