From 5cababb77b9bfa3917b2668501a00d6a0d3fd4f0 Mon Sep 17 00:00:00 2001 From: David Galiffi Date: Thu, 16 May 2024 11:53:38 -0400 Subject: [PATCH] Fix linting errors in rocThrust --- Libraries/rocSPARSE/level_2/bsrxmv/README.md | 1 + Libraries/rocThrust/README.md | 15 +++++++++++++-- Libraries/rocThrust/device_ptr/README.md | 7 ++++++- Libraries/rocThrust/norm/README.md | 5 +++++ Libraries/rocThrust/reduce_sum/README.md | 5 +++++ Libraries/rocThrust/remove_points/README.md | 7 ++++++- Libraries/rocThrust/saxpy/README.md | 7 ++++++- Libraries/rocThrust/vectors/README.md | 7 ++++++- 8 files changed, 48 insertions(+), 6 deletions(-) diff --git a/Libraries/rocSPARSE/level_2/bsrxmv/README.md b/Libraries/rocSPARSE/level_2/bsrxmv/README.md index 2bc05937d..83a29bdd7 100644 --- a/Libraries/rocSPARSE/level_2/bsrxmv/README.md +++ b/Libraries/rocSPARSE/level_2/bsrxmv/README.md @@ -188,6 +188,7 @@ $$ \mathbf{\bar{m}} = \left( \right)$$ The BSRX format is the same as BSR, but the `bsr_row_ptr` is separated into starting and ending indices. + - `bsrx_row_ptr`: the first block of each row that is used for the calculation. This block is typically the first nonzero block. - `bsrx_end_ptr`: the position next to the last block (last + 1) that is used for the calculation. This block is typically the last nonzero block. diff --git a/Libraries/rocThrust/README.md b/Libraries/rocThrust/README.md index ba2ae1d4d..92f7c084f 100644 --- a/Libraries/rocThrust/README.md +++ b/Libraries/rocThrust/README.md @@ -1,28 +1,35 @@ # rocThrust Examples ## Summary + The examples in this subdirectory showcase the functionality of the [rocThrust](https://github.com/rocmSoftwarePlatform/rocThrust) library. The examples build on Linux using the ROCm platform and on Windows using the HIP on Windows platform. ## Prerequisites + ### Linux + - [CMake](https://cmake.org/download/) (at least version 3.21) -- OR GNU Make - available via the distribution's package manager + - OR GNU Make - available via the distribution's package manager - [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x) - [rocThrust](https://github.com/rocmSoftwarePlatform/rocThrust): `rocthrust-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2/page/How_to_Install_ROCm.html). ### Windows + - [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload - ROCm toolchain for Windows (No public release yet) - - The Visual Studio ROCm extension needs to be installed to build with the solution files. + - The Visual Studio ROCm extension needs to be installed to build with the solution files. - [rocThrust](https://github.com/rocmSoftwarePlatform/rocThrust): installed as part of the ROCm SDK on Windows - [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21) - [Ninja](https://ninja-build.org/) (optional, to build with CMake) ## Building + ### Linux + Make sure that the dependencies are installed, or use the [provided Dockerfile](../../Dockerfiles/hip-libraries-rocm-ubuntu.Dockerfile) to build and run the examples in a containerized environment that has all prerequisites installed. #### Using CMake + All examples in the `rocThrust` subdirectory can either be built by a single CMake project or be built independently. - `$ cd Libraries/rocThrust` @@ -30,16 +37,20 @@ All examples in the `rocThrust` subdirectory can either be built by a single CMa - `$ cmake --build build` #### Using Make + All examples can be built by a single invocation to Make or be built independently. - `$ cd Libraries/rocThrust` - `$ make` ### Windows + #### Visual Studio + Visual Studio solution files are available for the individual examples. To build all examples for rocThrust open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for rocThrust. For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio). #### CMake + All examples in the `rocThrust` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2). diff --git a/Libraries/rocThrust/device_ptr/README.md b/Libraries/rocThrust/device_ptr/README.md index 3d5bf1cb1..8d92fedeb 100644 --- a/Libraries/rocThrust/device_ptr/README.md +++ b/Libraries/rocThrust/device_ptr/README.md @@ -1,9 +1,11 @@ # rocThrust Device Pointer Example ## Description + This simple program showcases the usage of the `thrust::device_ptr` template. -### Application flow +### Application flow + 1. A `thrust::device_ptr` is instantiated, and memory for ten elements is allocated. 2. Two more `thrust::device_ptr` are instantiated and set to the start- and end-point of the allocated memory region. 3. Normal pointer arithmetic is used on the `thrust::device_ptr`s to calculate the number of elements allocated in step 1. @@ -15,6 +17,7 @@ This simple program showcases the usage of the `thrust::device_ptr` template. 9. The device memory is freed using `thrust::device_free`. ## Key APIs and Concepts + - Thrust's `device_ptr` is a simple and transparent way of handling device memory the same way one would handle host memory with normal pointers. - Unlike a normal pointer to device memory `device_ptr` adds type safety, and the underlying device memory is transparently accessible on the host. - The `device_ptr` can be used in Thrust algorithms like a normal pointer to device memory. @@ -22,7 +25,9 @@ This simple program showcases the usage of the `thrust::device_ptr` template. - `device_ptr` is not a smart pointer. Allocating and freeing memory lies in the responsibility of the programmer. ## Demonstrated API Calls + ### rocThrust + - `thrust::device_ptr::operator=` - `thrust::device_ptr::operator[]` - `thrust::device_malloc` diff --git a/Libraries/rocThrust/norm/README.md b/Libraries/rocThrust/norm/README.md index 937fb57ea..7e5ca96c2 100644 --- a/Libraries/rocThrust/norm/README.md +++ b/Libraries/rocThrust/norm/README.md @@ -1,9 +1,11 @@ # rocThrust Norm Example ## Description + An example is presented to compute the Euclidean norm of a `thrust::device_vector`. The result is written to the standard output. ### Application flow + 1. Instantiate a host vector. 2. Copy the vector to the device by constructing `thrust::device_vector` from the host vector. 3. Set the initial value for the transformed reduction to 0. @@ -11,13 +13,16 @@ An example is presented to compute the Euclidean norm of a `thrust::device_vecto 5. Print the norm to the standard output. ## Key APIs and Concepts + - `thrust::transform_reduce()` computes a generalized sum (AKA reduction or fold) after transforming each element with a unary function. Both the transformation and the reduction function can be specified. (e.g. with `thrust::plus` as the binary summation and `f` as the transform function `transform_reduce` would compute the value of `f(a[0]) + f(a[1]) + f(a[2]) + ...`). - In the example, the operator is the `thrust::plus` function object with doubles. It is a binary operator that returns the arithmetic sum. - An initial value is required for the summation. - A `thrust::device_vector` is used to simplify memory management and transfer. See the [vectors example](../vectors) for the usage of `thrust::vector`. ## Demonstrated API Calls + ### rocThrust + - `thrust::device_vector::device_vector` - `thrust::plus` - `thrust::reduce()` diff --git a/Libraries/rocThrust/reduce_sum/README.md b/Libraries/rocThrust/reduce_sum/README.md index 5d7c23692..db8f9d1d5 100644 --- a/Libraries/rocThrust/reduce_sum/README.md +++ b/Libraries/rocThrust/reduce_sum/README.md @@ -1,9 +1,11 @@ # rocThrust sum (reduce) example ## Description + An example is presented to compute the sum of a `thrust::device_vector` integer vector using the `thrust::reduce()` generalized summation and the `thrust::plus` operator. The result is written to the standard output. ### Application flow + 1. Instantiate a `thrust::host_vector` and fill the elements. The values of the elements are printed to the standard output. 2. Copy the vector to the device by `thrust::device_vector`. 3. Set the initial value of the reduction. @@ -11,12 +13,15 @@ An example is presented to compute the sum of a `thrust::device_vector` integer 5. Print the sum to the standard output. ## Key APIs and Concepts + - The `thrust::reduce()` function returns a generalized sum. The summation operator has to be provided by the caller. - In the example, the operator is the `thrust::plus` function object with integers. It is a binary operator that returns the arithmetic sum. - A `thrust::device_vector` and a `thrust::host_vector` are used to simplify memory management and transfer. For further details, please visit the [vectors example](../vectors/). ## Demonstrated API Calls + ### rocThrust + - `thrust::host_vector::host_vector` - `thrust::host_vector::operator[]` - `thrust::device_vector::device_vector` diff --git a/Libraries/rocThrust/remove_points/README.md b/Libraries/rocThrust/remove_points/README.md index 9b14032fb..0d45f0b09 100644 --- a/Libraries/rocThrust/remove_points/README.md +++ b/Libraries/rocThrust/remove_points/README.md @@ -1,10 +1,12 @@ # rocThrust Remove Points Example ## Description + This short program demonstrates the usage of the `thrust` random number generation, host vector, generation, tuple, zip iterator, and conditional removal templates. It generates a number of random points $(x, y)$ in a unit square $x,y\in[0,1)$ and then removes all of them outside the unit circle, i.e. with $x^2 + y^2 > 1$. ## Key APIs and Concepts + - Thrust provides functionality for random number generation similar to [the STL `` header](https://en.cppreference.com/w/cpp/header/random) (from C++11 and above), like `thrust::default_random_engine`, `thrust::uniform_real_distribution` and so on. - Thrust's vectors implement RAII-style ownership over device and host memory pointers (similarly to `std::vector`). The instances are aware of the requested element count, allocate the required amount of memory, and free it upon destruction. When resized, the memory is reallocated if needed. - It is suggested that developers use `host_vector` instead of explicit invocations to `malloc` and `free` functions. @@ -13,14 +15,17 @@ It generates a number of random points $(x, y)$ in a unit square $x,y\in[0,1)$ a - The zip iterator provides the ability to parallel-iterate over several controlled sequences simultaneously. A zip iterator is constructed from a tuple of iterators. Moving the zip iterator moves all the iterators in parallel. Dereferencing the zip iterator returns a tuple that contains the results of dereferencing the individual iterators. - `remove_if` "removes" every element on which the predicate evaluates to `true` from the range specified by begin and end iterators. All kept elements are moved to the beginning of the range in the same order as in the original sequence, and the end iterator to the range of kept elements is returned. Idiomatic usage of conditional removal is the so-called _erase–remove idiom_ `S.erase(remove_if(S.begin(), S.end(), pred), S.end())`. This idiom cannot be used here because the `zip_iterator` refers to multiple containers. -### Application flow +### Application flow + 1. A `thrust::default_random_engine` is instantiated and values are sampled from a uniform distribution between 0 and 1 using `thrust::uniform_real_distribution`. 2. To hold the coordinates of the points, two `thrust::host_vector`s are constructed. Their elements are set one-by-one from a uniform distribution by `generate` and the points are printed to the standard output. 3. Zip iterators are constructed from `begin` and `end` iterators over the coordinate vectors and then passed to the `thrust::remove_if` operation. The operation uses a test `is_outside_circle` to remove all points outside the unit circle and puts all remaining points to the beginning of the range spanned by the zip iterators. `thrust::remove_if` returns an end iterator to the remaining points. The new size for vectors is calculated by finding distance between returned iterator and `begin` iterator and the vectors are resized accordingly. 4. Finally, the remaining points are printed again. ## Demonstrated API Calls + ### rocThrust + - `thrust::default_random_engine::default_random_engine` - `thrust::uniform_real_distribution::uniform_real_distribution(RealType, RealType)` - `thrust::uniform_real_distribution::operator()(UniformRandomNumberGenerator)` diff --git a/Libraries/rocThrust/saxpy/README.md b/Libraries/rocThrust/saxpy/README.md index 1bcd3f52c..6dbe9df1a 100644 --- a/Libraries/rocThrust/saxpy/README.md +++ b/Libraries/rocThrust/saxpy/README.md @@ -1,9 +1,11 @@ # rocThrust Saxpy Example ## Description + This simple program implements the SAXPY operation (`Y[i] = a * X[i] + Y[i]`) using rocThrust and showcases the usage of the vector and functor templates and of `thrust::fill` and `thrust::transform` operations. -### Application flow +### Application flow + 1. Two host arrrays of floats `x` and `y` are instantiated, and their contents are printed to the standard output. 2. Two `thrust::device_vector`s, `X` and `Y`, are instantiated with the corresponding arrays. The contents are copied to the device. 3. The `saxpy_slow` function is invoked next. It uses the most straightforward implementation using a temporary device vector `temp` and two separate transformations, one with multiplies and one with plus. First, the `temp` vector is filled with `a` values, using `thrust::fill`. Then, it is filled by transformed values of `a * X[i]` by `thrust::transform` using the `thrust::multiplies` functor. Last, the device vector `Y` is filled by `temp[i] + Y[i]` by `thrust::transform` using the `thrust::plus` functor. @@ -13,6 +15,7 @@ This simple program implements the SAXPY operation (`Y[i] = a * X[i] + Y[i]`) us 7. The values of device vector `Y` are printed to the standard output. The `X` and `Y` vectors are destroyed. ## Key APIs and Concepts + - rocThrust's device and host vectors implement RAII-style ownership over device and host memory pointers (similarly to `std::vector`). The instances are aware of the requested element count, allocate the required amount of memory, and free it upon destruction. When resized, the memory is reallocated if needed. - Additionally, using `device_vector` and `host_vector` simplifies the transfers between device and host memory to a copy assignment. Note that iterators over device containers can be used everywhere just like host iterators. - It is suggested that developers use `device_vector` and `host_vector` instead of explicit invocations to `malloc` and `free` functions. @@ -22,7 +25,9 @@ This simple program implements the SAXPY operation (`Y[i] = a * X[i] + Y[i]`) us - [Fused Multiply-Add (FMA)](https://en.cppreference.com/w/cpp/numeric/math/fma) operation `fma` represents multiplication of the first two arguments followed by addition of the third one to the product. It has the advantage of being faster and more accurate compated to separate multiplication and addition on the hardware that support such an instruction, as it avoids cancellation error in addition (addition inside `fma` operation proceeds with full non-rounded result of multiplication that is twice wider). ## Demonstrated API Calls + ### rocThrust + - `thrust::host_vector::host_vector` - `thrust::host_vector::operator[]` - `thrust::host_vector::begin()` diff --git a/Libraries/rocThrust/vectors/README.md b/Libraries/rocThrust/vectors/README.md index 292d2a07b..f2b1487e5 100644 --- a/Libraries/rocThrust/vectors/README.md +++ b/Libraries/rocThrust/vectors/README.md @@ -1,21 +1,26 @@ # rocThrust Vectors Example ## Description + This simple program showcases the usage of the `thrust::device_vector` and the `thrust::host_vector` templates. -### Application flow +### Application flow + 1. A `thrust::host_vector` is instantiated, its elements are set one-by-one, and the vector is printed to the standard output. 2. The `host_vector` is resized and it is printed again to the standard output. 3. A `thrust::device_vector` is instantiated with the aforementioned `host_vector`. The contents are copied to the device. 4. The `device_vector`'s elements are modified from host code, and it is printed to the standard output. ## Key APIs and Concepts + - Thrust's device and host vectors implement RAII-style ownership over device and host memory pointers (similarly to `std::vector`). The instances are aware of the requested element count, allocate the required amount of memory, and free it upon destruction. When resized, the memory is reallocated if needed. - Additionally, using `device_vector` and `host_vector` simplifies the transfers between device and host memory to a copy assignment. - It is suggested that developers use `device_vector` and `host_vector` instead of explicit invocations to `malloc` and `free` functions. ## Demonstrated API Calls + ### rocThrust + - `thrust::host_vector::host_vector` - `thrust::host_vector::~host_vector` - `thrust::host_vector::operator[]`