You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/markdown/hip_debugging.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -262,7 +262,7 @@ The following is the summary of the most useful environment variables in HIP.
262
262
| AMD_SERIALIZE_COPY <br><sub> Serialize copies. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
263
263
| HIP_HOST_COHERENT <br><sub> Coherent memory in hipHostMalloc. </sub> | 0 | 0: memory is not coherent between host and GPU. <br> 1: memory is coherent with host. |
| GPU_MAX_HW_QUEUES <br><sub> The maximum number of hardware queues allocated per device. </sub> | 4 | The variable controls how many independent hardware queues HIP runtime can create per process, per device. If application allocates more HIP streams than this number, then HIP runtime will reuse the same hardware queues for the new streams in round robin manner. Please note, this maximum number does not apply to either hardware queues that are created for CU masked HIP streams, or cooperative queue for HIP Cooperative Groups (there is only one single queue per device). |
266
266
267
267
## General Debugging Tips
268
268
- 'gdb --args' can be used to conveniently pass the executable and arguments to gdb.
Copy file name to clipboardexpand all lines: docs/markdown/hip_kernel_language.md
+14-2
Original file line number
Diff line number
Diff line change
@@ -455,9 +455,9 @@ Following is the list of supported integer intrinsics. Note that intrinsics are
455
455
| unsigned int __popcll ( unsigned long long int x )<br><sub>Count the number of bits that are set to 1 in a 64 bit integer.</sub> |
456
456
| int __mul24 ( int x, int y )<br><sub>Multiply two 24bit integers.</sub> |
457
457
| unsigned int __umul24 ( unsigned int x, unsigned int y )<br><sub>Multiply two 24bit unsigned integers.</sub> |
458
-
<sub><bid="f3"><sup>[1]</sup></b>
458
+
<sub><bid="f3"><sup>[1]</sup></b>
459
459
The HIP-Clang implementation of __ffs() and __ffsll() contains code to add a constant +1 to produce the ffs result format.
460
-
For the cases where this overhead is not acceptable and programmer is willing to specialize for the platform,
460
+
For the cases where this overhead is not acceptable and programmer is willing to specialize for the platform,
461
461
HIP-Clang provides __lastbit_u32_u32(unsigned int input) and __lastbit_u32_u64(unsigned long long int input).
462
462
The index returned by __lastbit_ instructions starts at -1, while for ffs the index starts at 0.
463
463
@@ -496,6 +496,18 @@ long long int clock64()
496
496
```
497
497
Returns the value of counter that is incremented every clock cycle on device. Difference in values returned provides the cycles used.
498
498
499
+
```
500
+
long long int wall_clock64()
501
+
```
502
+
Returns wall clock count at a constant frequency on the device, which can be queried via HIP API with hipDeviceAttributeWallClockRate attribute of the device in HIP application code, for example,
Where hipDeviceAttributeWallClockRate is a device attribute.
508
+
Note that, wall clock frequency is a per-device attribute.
509
+
510
+
499
511
## Atomic Functions
500
512
501
513
Atomic functions execute as read-modify-write operations residing in global or shared memory. No other device or thread can observe or modify the memory location during an atomic operation. If multiple instructions from different devices or threads target the same memory location, the instructions are serialized in an undefined order.
Copy file name to clipboardexpand all lines: docs/markdown/hip_programming_guide.md
-3
Original file line number
Diff line number
Diff line change
@@ -102,9 +102,6 @@ A stronger system-level fence can be specified when the event is created with hi
102
102
- hipEventReleaseToSystem : Perform a system-scope release operation when the event is recorded. This will make both Coherent and Non-Coherent host memory visible to other agents in the system, but may involve heavyweight operations such as cache flushing. Coherent memory will typically use lighter-weight in-kernel synchronization mechanisms such as an atomic operation and thus does not need to use hipEventReleaseToSystem.
103
103
- hipEventDisableTiming: Events created with this flag would not record profiling data and provide best performance if used for synchronization.
104
104
105
-
Note, for HIP Events used in kernel dispatch using hipExtLaunchKernelGGL/hipExtLaunchKernel, events passed in the API are not explicitly recorded and should only be used to get elapsed time for that specific launch.
106
-
In case events are used across multiple dispatches, for example, start and stop events from different hipExtLaunchKernelGGL/hipExtLaunchKernel calls, they will be treated as invalid unrecorded events, HIP will throw error "hipErrorInvalidHandle" from hipEventElapsedTime.
107
-
108
105
### Summary and Recommendations:
109
106
110
107
- Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points. This memory allows in-kernel synchronization commands such as threadfence_system to work transparently.
0 commit comments