Skip to content

Commit

Permalink
Starting to fix linting errors in markdown files.
Browse files Browse the repository at this point in the history
  • Loading branch information
dgaliffiAMD committed May 15, 2024
1 parent 3a8b0db commit e8f5ce3
Show file tree
Hide file tree
Showing 20 changed files with 451 additions and 255 deletions.
10 changes: 10 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
MD013: false
MD024:
siblings_only: true
MD026:
punctuation: ".,;:!"
MD029:
style: ordered
MD033: false
MD034: false
MD041: false
13 changes: 12 additions & 1 deletion Applications/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,54 @@
# Applications Examples

## Summary

The examples in this subdirectory showcase several GPU-implementations of finance, computer science, physics, etc. models or algorithms that additionally offer a command line application. The examples are build on Linux for the ROCm (AMD GPU) backend. Some examples additionally support the CUDA (NVIDIA GPU) backend.

## Prerequisites

### Linux

- [CMake](https://cmake.org/download/) (at least version 3.21)
- OR GNU Make - available via the distribution's package manager
- [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x)

### Windows

- [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
- ROCm toolchain for Windows (No public release yet)
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
- [Ninja](https://ninja-build.org/) (optional, to build with CMake)

## Building

### Linux

Make sure that the dependencies are installed, or use one of the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment.

#### Using CMake

All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently.

- `$ cd Libraries/Applications`
- `$ cmake -S . -B build` (on ROCm) or `$ cmake -S . -B build -D GPU_RUNTIME=CUDA` (on CUDA, when supported)
- `$ cmake --build build`

#### Using Make

All examples can be built by a single invocation to Make or be built independently.

- `$ cd Libraries/Applications`
- `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA, when supported)

### Windows

#### Visual Studio

Visual Studio solution files are available for the individual examples. To build all supported HIP runtime examples open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for Applications.

For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).

#### CMake

All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).
15 changes: 10 additions & 5 deletions Dockerfiles/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,27 @@ This folder hosts Dockerfiles with ready-to-use environments for the various sam
Each sample describes which environment it can be used with.

## Building

From this folder execute
```

``` bash
docker build . -f <dockerfile> -t <result image name>
```

## List of Dockerfiles

### HIP libraries on the ROCm platform based on Ubuntu

Dockerfile: [hip-libraries-rocm-ubuntu.Dockerfile](hip-libraries-rocm-ubuntu.Dockerfile)

This is environment is based on Ubuntu targeting the ROCm platform. It has the HIP runtime and
the ROCm libraries installed. CMake is also installed in the image.
This is environment is based on Ubuntu targeting the ROCm platform. It has the
HIP runtime and the ROCm libraries installed. CMake is also installed in the image.
It can be used with most of the samples when running on a ROCm target.

### HIP libraries on the CUDA platform based on Ubuntu

Dockerfile: [hip-libraries-cuda-ubuntu.Dockerfile](hip-libraries-cuda-ubuntu.Dockerfile)

This is environment is based on Ubuntu targeting the CUDA platform. It has the HIP runtime and
the ROCm libraries installed. CMake is also installed in the image.
This is environment is based on Ubuntu targeting the CUDA platform. It has the
HIP runtime and the ROCm libraries installed. CMake is also installed in the image.
It can be used with the samples that support the CUDA target.
54 changes: 41 additions & 13 deletions Docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,61 @@
# Guidelines
To keep the style of the examples consistent, please follow the following guidelines when implementing your example.

To keep the style of the examples consistent, please follow the following
guidelines when implementing your example.

## Make/CMake
Each example has to at least support `CMake` as build system. The simpler examples should also support `Make`.<br/>
Every example has to be able to be built separately from the others, but also has to be added to the top-level build scripts.

Each example has to at least support `CMake` as build system.
The simpler examples should also support `Make`. <br/>
Every example has to be able to be built separately from the others,
but also has to be added to the top-level build scripts.

## Code Format
The formatting rules of the examples are enforced by `clang-format` using the `.clang-format` file in the top-level directory.

The formatting rules of the examples are enforced by `clang-format` using the
`.clang-format` file in the top-level directory.

## Variable Naming Conventions
- Use `lower_snake_case` style to name variables and functions (e.g. block_size, multiply_kernel and multiply_host).

- Use `lower_snake_case` style to name variables and functions (e.g. block_size,
multiply_kernel and multiply_host).
- Use `PascalCase` for `class`, `struct`, `enum` and template argument definitions.

## File and Directory Naming Conventions

- Top-level directories use `PascalCase`.
- The directories in Libraries/ should use the exact name of the library they represent, including casing. If any directory does not represent a library, it should named in `camelCase`.
- The directories in Libraries/ should use the exact name of the library they
represent, including casing. If any directory does not represent a library, it
should named in `camelCase`.
- Directories for individual examples use `snake_case`.
- Files generally use `snake_case`, with the exception of files for which an existing convention already applies (`README.md`, `LICENSE.md`, `CMakeLists.txt`, etc).
- Example binaries should be prefixed with the library name of the binary, so that there are no conflicts between libraries (e.g. `hipcub_device_sum` and `rocprim_device_sum`).
- Files generally use `snake_case`, with the exception of files for which an
existing convention already applies (`README.md`, `LICENSE.md`, `CMakeLists.txt`,
etc).
- Example binaries should be prefixed with the library name of the binary, so
hat there are no conflicts between libraries (e.g. `hipcub_device_sum` and
`rocprim_device_sum`).

## Utilities
Utility-functions (printing vectors, etc) and common error-handling code, that is used by all examples, should be moved to the common utility-header [example_utils.hpp](../Common/example_utils.hpp).

Utility-functions (printing vectors, etc) and common error-handling code, that
is used by all examples, should be moved to the common utility-header
[example_utils.hpp](../Common/example_utils.hpp).

## Error Handling
Error checking and handling should be applied where appropriate, e.g. when handling user input. `HIP_CHECK` should be used whenever possible. Exceptions should only be used if the complexity of the program requires it.<br/>
In most cases printing an explanation to stderr and terminating the program with an error code, as specified in the common header, is sufficient.

Error checking and handling should be applied where appropriate, e.g. when
handling user input. `HIP_CHECK` should be used whenever possible. Exceptions
should only be used if the complexity of the program requires it.<br/>
In most cases printing an explanation to stderr and terminating the program with
an error code, as specified in the common header, is sufficient.

## Printing Intermediate Results
Results should be printed when they are helpful for the understanding and showcasing the example. However the output shouldn't be overwhelming, printing a vector with hundreds of entries is usually not useful.

Results should be printed when they are helpful for the understanding and
showcasing the example. However the output shouldn't be overwhelming, printing
a vector with hundreds of entries is usually not useful.

## .gitignore
A .gitignore file is required in every example subdirectory to exclude the binary generated when using Make.

A .gitignore file is required in every example subdirectory to exclude the
binary generated when using Make.
13 changes: 12 additions & 1 deletion HIP-Basic/README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,56 @@
# HIP-Basic Examples

## Summary

The examples in this subdirectory showcase the functionality of the HIP runtime. The examples build on Linux for the ROCm (AMD GPU) backend. Some examples additionally support Windows, some examples additionally support the CUDA (NVIDIA GPU) backend.

## Prerequisites

### Linux

- [CMake](https://cmake.org/download/) (at least version 3.21)
- OR GNU Make - available via the distribution's package manager
- [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x)

### Windows

- [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
- ROCm toolchain for Windows (No public release yet)
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
- [Ninja](https://ninja-build.org/) (optional, to build with CMake)

## Building

### Linux

Make sure that the dependencies are installed, or use one of the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment.

#### Using CMake

All examples in the `HIP-Basic` subdirectory can either be built by a single CMake project or be built independently.

- `$ cd Libraries/HIP-Basic`
- `$ cmake -S . -B build` (on ROCm) or `$ cmake -S . -B build -D GPU_RUNTIME=CUDA` (on CUDA, when supported)
- `$ cmake --build build`

#### Using Make

All examples can be built by a single invocation to Make or be built independently.

- `$ cd Libraries/HIP-Basic`
- `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA, when supported)

### Windows

Not all HIP runtime examples support building on Windows. See the README file in the directory of the example for more details.

#### Visual Studio

Visual Studio solution files are available for the individual examples. To build all supported HIP runtime examples open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for HIP-Basic.

For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).

#### CMake

All examples in the `HIP-Basic` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).
21 changes: 14 additions & 7 deletions HIP-Basic/saxpy/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# HIP-Basic "SAXPY" Example

## Description

This program demonstrates a simple implementation of the "SAXPY" kernel. The "S" stands for single-precision (i.e. `float`) and "AXPY" stands for the operation performed: $Y_i=aX_i+Y_i$. The simple nature of this example makes it an ideal starting point for developers who are just getting introduced to HIP.

### Application flow
### Application flow

1. A number of constants are defined to control the problem details and the kernel launch parameters.
2. The two input vectors, $X$ and $Y$ are instantiated in host memory. $X$ is filled with an incrementing sequence starting from 1, whereas $Y$ is filled with ones.
3. The necessary amount of device (GPU) memory is allocated and the elements of the input vectors are copied to the device memory.
Expand All @@ -14,24 +16,29 @@ This program demonstrates a simple implementation of the "SAXPY" kernel. The "S"
8. The first few elements of the result vector are printed to the standard output.

## Key APIs and Concepts

- `hipMalloc` is used to allocate memory in the global memory of the device (GPU). This is usually necessary, since the kernels running on the device cannot access host (CPU) memory (unless it is device-accessible pinned host memory, see `hipHostMalloc`). Beware, that the memory returned is uninitialized.
- `hipFree` de-allocates device memory allocated by `hipMalloc`. It is necessary to free no longer used memory with this function to avoid resource leakage.
- `hipMemcpy` is used to transfer bytes between the host and the device memory in both directions. A call to it synchronizes the device with the host, meaning that all kernels queued before `hipMemcpy` will finish before the copying starts. The function returns once the copying has finished.
- `myKernelName<<<gridDim, blockDim, dynamicShared, stream>>>(kernelArguments)` queues the execution of the provided kernel on the device. It is asynchronous, the call may return before the execution of the kernel is finished. Its arguments come as the following:
- The kernel (`__global__`) function to launch.
- The number of blocks in the kernel grid, i.e. the grid size. It can be up to 3 dimensions.
- The number of threads in each block, i.e. the block size. It can be up to 3 dimensions.
- The amount of dynamic shared memory provided for the kernel, in bytes. Not used in this example.
- The device stream, on which the kernel is queued. In this example, the default stream is used.
- All further arguments are passed to the kernel function. Notice, that built-in and simple (POD) types may be passed to the kernel, but complex ones (e.g. `std::vector`) usually cannot be.
- The kernel (`__global__`) function to launch.
- The number of blocks in the kernel grid, i.e. the grid size. It can be up to 3 dimensions.
- The number of threads in each block, i.e. the block size. It can be up to 3 dimensions.
- The amount of dynamic shared memory provided for the kernel, in bytes. Not used in this example.
- The device stream, on which the kernel is queued. In this example, the default stream is used.
- All further arguments are passed to the kernel function. Notice, that built-in and simple (POD) types may be passed to the kernel, but complex ones (e.g. `std::vector`) usually cannot be.
- `hipGetLastError` returns the error code resulting from the previous operation.

## Demonstrated API Calls

### HIP runtime

#### Device symbols

- `threadIdx`, `blockIdx`, `blockDim`

#### Host symbols

- `hipMalloc`
- `hipFree`
- `hipMemcpy`
Expand Down
16 changes: 11 additions & 5 deletions HIP-Basic/shared_memory/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,27 @@
# HIP-Basic Shared Memory Example

## Description
The shared memory is an on-chip type of memory that is visible to all the threads within the same block, allowing them to communicate by writing and reading data from the same memory space. However, some synchronization among the threads of the block is needed to ensure that all of them have written before trying to access the data.

When using the appropriate access pattern, this memory can provide much less latency than local or global memory (nearly as much as registers), making it a much better option in certain cases. If the size of the shared memory to be used is known at compile time, it can be explicitly specified and it is then known as static shared memory.
The shared memory is an on-chip type of memory that is visible to all the threads within the same block, allowing them to communicate by writing and reading data from the same memory space. However, some synchronization among the threads of the block is needed to ensure that all of them have written before trying to access the data.

When using the appropriate access pattern, this memory can provide much less latency than local or global memory (nearly as much as registers), making it a much better option in certain cases. If the size of the shared memory to be used is known at compile time, it can be explicitly specified and it is then known as static shared memory.

This example implements a simple matrix transpose kernel to showcase how to use static shared memory.

### Application flow
### Application flow

1. A number of constants are defined for the kernel launch parameters.
2. The input and output matrices are allocated and initialized in host memory.
3. The necessary amount of device memory for the input and output matrices is allocated and the input data is copied to the device.
4. A trace message is printed to the standard output.
5. The GPU kernel is then launched with the previously defined arguments.
5. The GPU kernel is then launched with the previously defined arguments.
6. The transposed matrix is copied back to host memory.
7. All device memory is freed.
8. The expected transposed matrix is calculated with a CPU version of the transpose kernel and the transposed matrix obtained from the kernel execution is then compared with it. The result of the comparison is printed to the standard output.

## Key APIs and Concepts
- `__shared__` is a variable declaration specifier necessary to allocate shared memory from the device.

- `__shared__` is a variable declaration specifier necessary to allocate shared memory from the device.
- `__syncthreads` allows to synchronize all the threads within the same block. This synchronization barrier is used to ensure that every thread in a block have finished writing in shared memory before another threads in the block try to access that data.
- `hipMalloc` allocates host device memory in global memory, and with `hipMemcpy` data bytes can be transferred from host to device (using `hipMemcpyHostToDevice`) or from device to host (using `hipMemcpyDeviceToHost`), among others.
- `myKernelName<<<...>>>` queues the execution of a kernel on a device (GPU).
Expand All @@ -28,16 +31,19 @@ This example implements a simple matrix transpose kernel to showcase how to use
## Demonstrated API Calls

### HIP runtime

- `__global__`
- `__shared__`

#### Device symbols

- `blockDim`
- `blockIdx`
- `threadIdx`
- `__syncthreads`

#### Host symbols

- `hipFree`
- `hipGetLastError`
- `hipMalloc`
Expand Down
Loading

0 comments on commit e8f5ce3

Please sign in to comment.