Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(autoware_cuda_pointcloud_preprocessor): a cuda-accelerated pointcloud preprocessor #9454

Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
af2e884
feat: moved the cuda pointcloud preprocessor and organized from a per…
knzo25 Nov 25, 2024
774e099
chore: fixed incorrect links
knzo25 Nov 25, 2024
be04f76
chore: fixed dead links pt2
knzo25 Nov 25, 2024
db02ec7
chore: fixed spelling errors
knzo25 Nov 25, 2024
5218a4a
chore: json schema fixes
knzo25 Nov 25, 2024
4a9daaf
chore: removed comments and filled the fields
knzo25 Nov 26, 2024
84a4b9f
fix: fixed the adapter for the case when the number of points in the …
knzo25 Nov 26, 2024
1e27534
Merge remote-tracking branch 'awf/main' into feat/cuda_blackboard_poi…
knzo25 Dec 23, 2024
b3c1d72
feat: used the cuda host allocators for aster host to device copies
knzo25 Dec 23, 2024
887f162
Update sensing/autoware_cuda_pointcloud_preprocessor/docs/cuda-pointc…
knzo25 Jan 10, 2025
0b46fb6
Update sensing/autoware_cuda_pointcloud_preprocessor/src/cuda_pointcl…
knzo25 Jan 10, 2025
970055e
Update sensing/autoware_cuda_pointcloud_preprocessor/src/cuda_pointcl…
knzo25 Jan 10, 2025
71b8a6f
style(pre-commit): autofix
pre-commit-ci[bot] Jan 10, 2025
d6ee47f
Update sensing/autoware_cuda_pointcloud_preprocessor/docs/cuda-pointc…
knzo25 Jan 10, 2025
3f6953f
Update sensing/autoware_cuda_pointcloud_preprocessor/README.md
knzo25 Jan 10, 2025
db99ba6
Update sensing/autoware_cuda_pointcloud_preprocessor/README.md
knzo25 Jan 10, 2025
9ce2e06
Update sensing/autoware_cuda_pointcloud_preprocessor/src/cuda_pointcl…
knzo25 Jan 10, 2025
f474900
style(pre-commit): autofix
pre-commit-ci[bot] Jan 10, 2025
5941435
Update sensing/autoware_cuda_pointcloud_preprocessor/src/cuda_pointcl…
knzo25 Jan 21, 2025
90ad0b2
style(pre-commit): autofix
pre-commit-ci[bot] Jan 21, 2025
13aff8c
Update sensing/autoware_cuda_pointcloud_preprocessor/src/cuda_pointcl…
knzo25 Jan 21, 2025
f0e4c20
style(pre-commit): autofix
pre-commit-ci[bot] Jan 21, 2025
d05b988
chore: fixed code compilation to reflect Hirabayashi-san's memory po…
knzo25 Jan 21, 2025
5e0cc47
feat: generalized the number of crop boxes. For two at least, the new…
knzo25 Jan 23, 2025
6baaea2
chore: updated config, schema, and handled the null case in a special…
knzo25 Jan 23, 2025
59144d8
feat: moving the pointcloud organization into gpu
knzo25 Feb 4, 2025
5482f9c
feat: reimplemented the organized pointcloud adapter in cuda. the onl…
knzo25 Feb 5, 2025
6660210
Merge branch 'main' into feat/cuda_blackboard_pointcloud_preprocessing
amadeuszsz Feb 5, 2025
c554290
chore: removed redundant ternay operator
knzo25 Feb 5, 2025
d6ad9db
chore: added a temporary memory check. the check will be unified in a…
knzo25 Feb 5, 2025
303b9ed
chore: refactored the structure to avoid large files
knzo25 Feb 5, 2025
17799e5
chore: updated the copyright year
knzo25 Feb 5, 2025
5f9c4a6
Merge branch 'feat/cuda_blackboard_pointcloud_preprocessing' of githu…
knzo25 Feb 5, 2025
f100224
fix: fixed a bug in the undistortion kernel setup. validated it compa…
knzo25 Feb 5, 2025
6f0fe0c
chore: removed unused packages
knzo25 Feb 13, 2025
2330cd6
chore: removed mentions of the removed adapter
knzo25 Feb 13, 2025
48b39cd
chore: fixed missing autoware prefix
knzo25 Feb 14, 2025
fa36260
fix: missing assignment in else branch
knzo25 Feb 14, 2025
2b9f654
chore: added cuda/nvcc debug flags on debug builds
knzo25 Feb 14, 2025
186fff9
chore: refactored parameters for the undistortion settings
knzo25 Feb 14, 2025
d2505ee
chore: removed unused headers
knzo25 Feb 14, 2025
7da12ed
Merge branch 'main' into feat/cuda_blackboard_pointcloud_preprocessing
amadeuszsz Mar 3, 2025
6945277
Merge branch 'main' into feat/cuda_blackboard_pointcloud_preprocessing
knzo25 Mar 6, 2025
bc4d470
chore: changed default crop box to no filtering at all
knzo25 Mar 6, 2025
68d1e42
feat: added missing restrict keyword
knzo25 Mar 6, 2025
5dba7b3
chore: spells
knzo25 Mar 6, 2025
92b04a3
chore: removed default destructor
knzo25 Mar 6, 2025
17af8d8
chore: ocd activated (spelling)
knzo25 Mar 6, 2025
652d168
chore: fixed the schema
knzo25 Mar 6, 2025
2449e67
chore: improved readibility
knzo25 Mar 6, 2025
9ededc5
chore: added dummy crop box
knzo25 Mar 17, 2025
5b55567
chore: added new repositories to ansible
knzo25 Mar 18, 2025
405e5f8
chore: CI/CD
knzo25 Mar 18, 2025
df442d5
chore: more CI/CD
knzo25 Mar 18, 2025
f4201b6
chore: mode CI/CD. some linters are conflicting
knzo25 Mar 18, 2025
3fcbc42
style(pre-commit): autofix
pre-commit-ci[bot] Mar 18, 2025
47e33bf
chore: ignoring uncrustify
knzo25 Mar 18, 2025
8b4700b
chore: ignoring more uncrustify
knzo25 Mar 18, 2025
368009e
chore: missed one more uncrustify exception
knzo25 Mar 18, 2025
a2713a0
chore: added meta dep
knzo25 Mar 18, 2025
2a1ff8c
Merge branch 'main' into feat/cuda_blackboard_pointcloud_preprocessing
knzo25 Mar 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions sensing/autoware_cuda_pointcloud_preprocessor/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
cmake_minimum_required(VERSION 3.14)
project(autoware_cuda_pointcloud_preprocessor)

find_package(ament_cmake_auto REQUIRED)
find_package(CUDA)

if(NOT ${CUDA_FOUND})
message(WARNING "cuda was not found, so the autoware_cuda_pointcloud_preprocessor package will not be built.")
return()
elseif(CMAKE_BUILD_TYPE STREQUAL "Debug")
set(CMAKE_CUDA_FLAGS ${CMAKE_CUDA_FLAGS} "-g -G")
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} "-g -G")
endif()

ament_auto_find_build_dependencies()

# Default to C++17
if (NOT CMAKE_CXX_STANDARD)
set(CMAKE_CXX_STANDARD 17)
endif ()

if (CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang")
add_compile_options(-Wall -Wextra -Wpedantic -Wunused-function)
endif ()

if(BUILD_TESTING)
find_package(ament_lint_auto REQUIRED)
ament_lint_auto_find_test_dependencies()
endif()

include_directories(
include
SYSTEM
${CUDA_INCLUDE_DIRS}
)

list(APPEND CUDA_NVCC_FLAGS "--expt-relaxed-constexpr -diag-suppress 20012") # cSpell: ignore expt

cuda_add_library(cuda_pointcloud_preprocessor_lib SHARED
src/cuda_pointcloud_preprocessor/cuda_pointcloud_preprocessor.cu
src/cuda_pointcloud_preprocessor/common_kernels.cu
src/cuda_pointcloud_preprocessor/organize_kernels.cu
src/cuda_pointcloud_preprocessor/outlier_kernels.cu
src/cuda_pointcloud_preprocessor/undistort_kernels.cu
)

target_link_libraries(cuda_pointcloud_preprocessor_lib
${autoware_pointcloud_preprocessor_TARGETS}
)

target_include_directories(cuda_pointcloud_preprocessor_lib SYSTEM PRIVATE
${autoware_pointcloud_preprocessor_INCLUDE_DIRS}
${autoware_point_types_INCLUDE_DIRS}
${cuda_blackboard_INCLUDE_DIRS}
${diagnostic_msgs_INCLUDE_DIRS}
${geometry_msgs_INCLUDE_DIRS}
${rclcpp_INCLUDE_DIRS}
${rclcpp_components_INCLUDE_DIRS}
${rcl_interfaces_INCLUDE_DIRS}
${sensor_msgs_INCLUDE_DIRS}
${tf2_INCLUDE_DIRS}
${tf2_msgs_INCLUDE_DIRS}
)


# Targets
ament_auto_add_library(cuda_pointcloud_preprocessor SHARED
src/cuda_pointcloud_preprocessor/cuda_pointcloud_preprocessor_node.cpp
)

target_link_libraries(cuda_pointcloud_preprocessor
${CUDA_LIBRARIES}
cuda_pointcloud_preprocessor_lib
)

rclcpp_components_register_node(cuda_pointcloud_preprocessor
PLUGIN "autoware::cuda_pointcloud_preprocessor::CudaPointcloudPreprocessorNode"
EXECUTABLE cuda_pointcloud_preprocessor_node
)

install(DIRECTORY launch config
DESTINATION share/${PROJECT_NAME}
)

ament_auto_package()

# Set ROS_DISTRO macros
set(ROS_DISTRO $ENV{ROS_DISTRO})
if(${ROS_DISTRO} STREQUAL "rolling")
add_compile_definitions(ROS_DISTRO_ROLLING)
elseif(${ROS_DISTRO} STREQUAL "foxy")
add_compile_definitions(ROS_DISTRO_FOXY)
elseif(${ROS_DISTRO} STREQUAL "galactic")
add_compile_definitions(ROS_DISTRO_GALACTIC)
elseif(${ROS_DISTRO} STREQUAL "humble")
add_compile_definitions(ROS_DISTRO_HUMBLE)
endif()
19 changes: 19 additions & 0 deletions sensing/autoware_cuda_pointcloud_preprocessor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# autoware_cuda_pointcloud_preprocessor

## Purpose

The pointcloud preprocessing implemented in `autoware_pointcloud_preprocessor` has been thoroughly tested in autoware. However, the latency it introduces does not scale well with modern LiDAR devices due to the high number of points they introduce.

To alleviate this issue, this package reimplements most of the pipeline presented in `autoware_pointcloud_preprocessor` leveraging the use of GPGPUs. In particular, this package makes use of CUDA to provide accelerated versions of the already established implementations, while also maintaining compatibility with normal ROS nodes/topics. <!-- cSpell: ignore GPGPUs >

## Inner-workings / Algorithms

A detailed description of each filter's algorithm is available in the following links.

| Filter Name | Description | Detail |
| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- |
| cuda_pointcloud_preprocessor | Implements the cropping, distortion correction, and outlier filtering (ring-based) of the `autoware_pointcloud_preprocessor`'s CPU versions. | [link](docs/cuda-pointcloud-preprocessor.md) |

## (Optional) Future extensions / Unimplemented parts

The subsample filters implemented in `autoware_pointcloud_preprocessor` will have similar counterparts in this package.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
/**:
ros__parameters:

base_frame: base_link
use_imu: true
use_3d_distortion_correction: false
distance_ratio: 1.03
object_length_threshold: 0.1
num_points_threshold: 4

crop_box.min_x: []
crop_box.min_y: []
crop_box.min_z: []
crop_box.max_x: []
crop_box.max_y: []
crop_box.max_z: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# cuda_pointcloud_preprocessor

## Purpose

This node implements all standard pointcloud preprocessing algorithms applied to a single LiDAR's pointcloud in CUDA.
In particular, this node implements:

- box cropping (ego-vehicle and ego-vehicle's mirrors)
- distortion correction
- ring-based outlier filtering

## Inner-workings / Algorithms

As this node reimplements the functionalities of the CPU-version algorithms, please have a look at the documentations of [crop-box](../../autoware_pointcloud_preprocessor/docs/crop-box-filter.md), [distortion correction](../../autoware_pointcloud_preprocessor/docs/distortion-corrector.md), and [ring-based outlier filter](../../autoware_pointcloud_preprocessor/docs/ring-outlier-filter.md) for more information about these algorithms.

In addition to the individual algorithms previously mentioned, this node uses the `cuda_blackboard`, a cuda transport layer that enables a zero-copy mechanism between GPU and GPU memory for both input and output.

## Inputs / Outputs

### Input

| Name | Type | Description |
| ------------------------- | ------------------------------------------------ | ----------------------------------------- |
| `~/input/pointcloud` | `sensor_msgs::msg::PointCloud2` | Input pointcloud's topic. |
| `~/input/pointcloud/cuda` | `negotiated_interfaces/msg/NegotiatedTopicsInfo` | Input pointcloud's type negotiation topic |
| `~/input/twist` | `geometry_msgs::msg::TwistWithCovarianceStamped` | Topic of the twist information. |
| `~/input/imu` | `sensor_msgs::msg::Imu` | Topic of the IMU data. |

### Output

| Name | Type | Description |
| -------------------------- | ------------------------------------------------ | ---------------------------------------- |
| `~/output/pointcloud` | `sensor_msgs::msg::PointCloud2` | Processed pointcloud's topic |
| `~/output/pointcloud/cuda` | `negotiated_interfaces/msg/NegotiatedTopicsInfo` | Processed pointcloud's negotiation topic |

## Parameters

### Core Parameters

{{ json_to_markdown("sensing/autoware_cuda_pointcloud_preprocessor/schema/cuda_pointcloud_preprocessor.schema.schema.json") }}

## Assumptions / Known limits

- The CUDA implementations, while following the original CPU ones, will not offer the same numerical results, and small approximations were needed to maximize GPU usage.
- This node expects that the input pointcloud follows the `autoware::point_types::PointXYZIRCAEDT` layout and the output pointcloud will use the `autoware::point_types::PointXYZIRC` layout defined in the `autoware_point_types` package.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// Copyright 2025 TIER IV, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#ifndef AUTOWARE__CUDA_POINTCLOUD_PREPROCESSOR__COMMON_KERNELS_HPP_
#define AUTOWARE__CUDA_POINTCLOUD_PREPROCESSOR__COMMON_KERNELS_HPP_

#include "autoware/cuda_pointcloud_preprocessor/point_types.hpp"
#include "autoware/cuda_pointcloud_preprocessor/types.hpp"

#include <cuda_runtime.h>

namespace autoware::cuda_pointcloud_preprocessor
{
void transformPointsLaunch(
const InputPointType * input_points, InputPointType * output_points, int num_points,
TransformStruct transform, int threads_per_block, int blocks_per_grid, cudaStream_t & stream);

void cropBoxLaunch(
InputPointType * d_points, std::uint32_t * output_mask, int num_points,
const CropBoxParameters * crop_box_parameters_ptr, int num_crop_boxes, int threads_per_block,
int blocks_per_grid, cudaStream_t & stream);

void combineMasksLaunch(
const std::uint32_t * mask1, const std::uint32_t * mask2, int num_points,
std::uint32_t * output_mask, int threads_per_block, int blocks_per_grid, cudaStream_t & stream);

void extractPointsLaunch(
InputPointType * input_points, std::uint32_t * masks, std::uint32_t * indices, int num_points,
OutputPointType * output_points, int threads_per_block, int blocks_per_grid,
cudaStream_t & stream);

} // namespace autoware::cuda_pointcloud_preprocessor

#endif // AUTOWARE__CUDA_POINTCLOUD_PREPROCESSOR__COMMON_KERNELS_HPP_
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
// Copyright 2025 TIER IV, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#ifndef AUTOWARE__CUDA_POINTCLOUD_PREPROCESSOR__CUDA_POINTCLOUD_PREPROCESSOR_HPP_
#define AUTOWARE__CUDA_POINTCLOUD_PREPROCESSOR__CUDA_POINTCLOUD_PREPROCESSOR_HPP_

#include "autoware/cuda_pointcloud_preprocessor/point_types.hpp"
#include "autoware/cuda_pointcloud_preprocessor/types.hpp"

#include <Eigen/Core>
#include <Eigen/Geometry>
#include <cuda_blackboard/cuda_pointcloud2.hpp>

#include <geometry_msgs/msg/transform_stamped.hpp>
#include <geometry_msgs/msg/twist_with_covariance_stamped.hpp>
#include <geometry_msgs/msg/vector3_stamped.hpp>
#include <sensor_msgs/msg/point_cloud2.hpp>

#include <thrust/device_vector.h>

#include <cstdint>
#include <deque>

namespace autoware::cuda_pointcloud_preprocessor
{

class CudaPointcloudPreprocessor
{
public:
enum class UndistortionType { Invalid, Undistortion2D, Undistortion3D };

CudaPointcloudPreprocessor();
~CudaPointcloudPreprocessor() = default;

void setCropBoxParameters(const std::vector<CropBoxParameters> & crop_box_parameters);
void setRingOutlierFilterParameters(const RingOutlierFilterParameters & ring_outlier_parameters);
void setUndistortionType(const UndistortionType & undistortion_type);

void preallocateOutput();

std::unique_ptr<cuda_blackboard::CudaPointCloud2> process(
const sensor_msgs::msg::PointCloud2::ConstSharedPtr input_pointcloud_msg_ptr,
const geometry_msgs::msg::TransformStamped & transform_msg,
const std::deque<geometry_msgs::msg::TwistWithCovarianceStamped> & twist_queue,
const std::deque<geometry_msgs::msg::Vector3Stamped> & angular_velocity_queue,
const std::uint32_t first_point_rel_stamp_nsec);

private:
void organizePointcloud();

CropBoxParameters self_crop_box_parameters_{};
CropBoxParameters mirror_crop_box_parameters_{};
RingOutlierFilterParameters ring_outlier_parameters_{};
UndistortionType undistortion_type_{UndistortionType::Invalid};

int num_rings_{};
int max_points_per_ring_{};
std::size_t num_raw_points_{};
std::size_t num_organized_points_{};

std::vector<sensor_msgs::msg::PointField> point_fields_{};
std::unique_ptr<cuda_blackboard::CudaPointCloud2> output_pointcloud_ptr_{};

cudaStream_t stream_;
int max_blocks_per_grid_{};
const int threads_per_block_{256};
cudaMemPool_t device_memory_pool_;

// Organizing buffers
thrust::device_vector<InputPointType> device_input_points_;
thrust::device_vector<InputPointType> device_organized_points_;
thrust::device_vector<std::int32_t> device_ring_index_;
thrust::device_vector<std::uint32_t> device_indexes_tensor_;
thrust::device_vector<std::uint32_t> device_sorted_indexes_tensor_;
thrust::device_vector<std::int32_t> device_segment_offsets_;
thrust::device_vector<std::int32_t> device_max_ring_;
thrust::device_vector<std::int32_t> device_max_points_per_ring_;

thrust::device_vector<std::uint8_t> device_sort_workspace_;
std::size_t sort_workspace_bytes_{0};

// Pointcloud preprocessing buffers
thrust::device_vector<InputPointType> device_transformed_points_{};
thrust::device_vector<OutputPointType> device_output_points_{};
thrust::device_vector<std::uint32_t> device_crop_mask_{};
thrust::device_vector<std::uint32_t> device_ring_outlier_mask_{};
thrust::device_vector<std::uint32_t> device_indices_{};
thrust::device_vector<TwistStruct2D> device_twist_2d_structs_{};
thrust::device_vector<TwistStruct3D> device_twist_3d_structs_{};
thrust::device_vector<CropBoxParameters> host_crop_box_structs_{};
thrust::device_vector<CropBoxParameters> device_crop_box_structs_{};
};

} // namespace autoware::cuda_pointcloud_preprocessor

#endif // AUTOWARE__CUDA_POINTCLOUD_PREPROCESSOR__CUDA_POINTCLOUD_PREPROCESSOR_HPP_
Loading
Loading