Skip to content

Commit 6b2a339

Browse files
nkogtevailyachur
andauthored
[NVIDIA] API 2.0 (#656)
* [NVIDIA] Plugin API 2.0 Update README.md * Import model with extensions * Small updates * Adds runtime info into the runtime graph * Support config per device * Fix bug with AUTO streams * Rework extension handling * Win fixes * Remove explicit extension adding * Add workaround for f16 inference precision. Enable inference precision in CoreConfiguration in tests * Fix license note * Small property update * Disable RemoveDuplicatedResultsTransformation + small fixes * Fix type * Fix after rebase * Refactor convert related handling * Apply review comments. Replace some legacy includes * Update RemoveRedundantConvertTransformation * Fix transformer, add test for f16 convertion * Small update * Update modules/nvidia_plugin/src/cuda_compiled_model.cpp Co-authored-by: Ilya Churaev <ilyachur@gmail.com> * Update modules/nvidia_plugin/src/cuda_compiled_model.cpp Co-authored-by: Ilya Churaev <ilyachur@gmail.com> * Review comments. Remove some redundant headers --------- Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
1 parent 2aa2b0d commit 6b2a339

File tree

230 files changed

+2128
-2915
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

230 files changed

+2128
-2915
lines changed

modules/nvidia_plugin/README.md

+16-33
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ sudo apt-get install clang-8 clang++8
3434

3535
2. Install suitable **NVIDIA driver** from [NVIDIA download drivers](http://www.nvidia.com/Download/index.aspx?lang=en-us)
3636
3. Install **CUDA 11.8** from [How to install CUDA](https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html)
37-
38-
Do not forget to add `<path_to_cuda>/bin/` in **PATH** variable for example `export PATH="<path_to_cuda>/bin:$PATH"`
37+
38+
Do not forget to add `<path_to_cuda>/bin/` in **PATH** variable for example `export PATH="<path_to_cuda>/bin:$PATH"`
3939

4040
4. Install **cuDNN 8.6.0** from [How to install cuDNN](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html)
4141
5. Install **cuTENSOR 1.6.1** from [How to install cuTENSOR](https://docs.nvidia.com/cuda/cutensor/getting_started.html#installation-and-compilation)
@@ -164,12 +164,21 @@ docker commit openvino/cudaplugin-2022.3 <name of new image>
164164
```
165165

166166
## Supported Configuration Parameters
167-
The plugin supports the configuration parameters listed below. All parameters must be set before calling `ov::Core::compile_model()` in order to take effect. When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix.
167+
The plugin supports the configuration parameters listed below:
168+
* `ov::hint::performance_mode`
169+
* `ov::hint::execution_mode`
170+
* `ov::hint::inference_precision`
171+
* `ov::num_streams`
172+
* `ov::enable_profiling`
173+
174+
Please refer to OpenVINO documentation for details.
168175

169-
Parameter name | Parameter values | Default | Description
170-
------------- | ------------- | ------------- | -------------
171-
`NVIDIA_THROUGHPUT_STREAMS` | `NVIDIA_THROUGHPUT_AUTO`, or non negative integer values | 1 | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously.
172-
`NVIDIA_OPERATION_BENCHMARK` | `NVIDIA_YES`, `NVIDIA_NO` | `NVIDIA_NO` | Specifies if operation level benchmark should be run for increasing performance of network
176+
### Plugin specific parameters
177+
* `ov::nvidia_gpu::operation_benchmark` - specifies if operation level benchmark should be run for increasing performance of network (`false` by default)
178+
179+
All parameters must be set before calling `ov::Core::compile_model()` in order to take effect.
180+
181+
## Compile options
173182

174183
During compilation of the openvino_nvidia_gpu_plugin, user could specify the following options:
175184
1) `-DCUDA_KERNEL_PRINT_LOG=ON` enables print logs from kernels (WARNING, be careful with this options, could print to many logs)
@@ -182,32 +191,6 @@ nvidia-smi --query-gpu=compute_cap --format=csv
182191
## Supported Layers and Limitations
183192
The plugin supports IRv10 and higher. The list of supported layers and its limitations are defined in [cuda_opset.md](docs/cuda_opset.md).
184193

185-
## Supported Model Formats
186-
* FP32 – Supported
187-
* FP16 – Supported and preferred
188-
* U8 - Not supported
189-
* U16 - Not supported
190-
* I8 - Not supported
191-
* I16 - Not supported
192-
193-
## Supported Input Precision
194-
* FP32 - Supported
195-
* FP16 - Supported
196-
* U8 - Not supported
197-
* U16 - Not supported
198-
* I8 - Not supported
199-
* I16 - Not supported
200-
201-
## Supported Output Precision
202-
* FP32 – Supported
203-
* FP16 - Not supported
204-
205-
## Supported Input Layout
206-
* NCDHW – Not supported
207-
* NCHW - Supported
208-
* NHWC - Supported
209-
* NC - Supported
210-
211194
## License
212195
OpenVINO™ NVIDIA GPU plugin is licensed under [Apache License Version 2.0](LICENSE).
213196
By contributing to the project, you agree to the license and copyright terms therein

modules/nvidia_plugin/src/CMakeLists.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ file(GLOB_RECURSE
1414
CONFIGURE_DEPENDS
1515
${SOURCE_MASKS}
1616
)
17-
list(REMOVE_ITEM SOURCES cuda_create_plugin.cpp)
17+
list(REMOVE_ITEM SOURCES cuda_create_plugin.cpp cuda_create_extensions.cpp)
1818
list(FILTER SOURCES EXCLUDE REGEX "^ops/examples/.*$")
1919
file(GLOB_RECURSE
2020
HEADERS
@@ -25,12 +25,12 @@ file(GLOB_RECURSE
2525
set_source_files_properties(*.cu *.cuh PROPERTIES LANGUAGE CUDA)
2626

2727
add_library(${OBJ_NAME} STATIC ${SOURCES})
28-
target_compile_definitions(${OBJ_NAME} PRIVATE IMPLEMENT_INFERENCE_ENGINE_PLUGIN)
28+
target_compile_definitions(${OBJ_NAME} PRIVATE IMPLEMENT_INFERENCE_ENGINE_PLUGIN IMPLEMENT_OPENVINO_EXTENSION_API)
2929

3030
# Adds a shared library with plugin
3131
ie_add_plugin(NAME ${TARGET_NAME}
3232
DEVICE_NAME "NVIDIA"
33-
SOURCES ${HEADERS} cuda_create_plugin.cpp
33+
SOURCES ${HEADERS} cuda_create_plugin.cpp cuda_create_extensions.cpp
3434
SKIP_INSTALL # ATTENTION: uncomment to install component
3535
VERSION_DEFINES_FOR cuda_create_plugin.cpp)
3636

modules/nvidia_plugin/src/cancellation_token.hpp

+4-14
Original file line numberDiff line numberDiff line change
@@ -26,23 +26,13 @@ class CancellationToken {
2626
/**
2727
* Set token status as cancelled
2828
*/
29-
void Cancel() { is_cancelled_.store(true, std::memory_order_release); }
30-
31-
/**
32-
* Throws exception THROW_IE_EXCEPTION_WITH_STATUS(INFER_CANCELLED) if detected cancel status
33-
*/
34-
void Check() {
35-
if (is_cancelled_.load(std::memory_order_acquire)) {
36-
is_cancelled_.store(false, std::memory_order_release);
37-
if (cancel_callback_) {
38-
cancel_callback_();
39-
}
40-
throwInferCancelled();
41-
}
29+
void cancel() {
30+
if (cancel_callback_) {
31+
cancel_callback_();
32+
};
4233
}
4334

4435
private:
45-
std::atomic<bool> is_cancelled_{false};
4636
std::function<void()> cancel_callback_;
4737
};
4838

modules/nvidia_plugin/src/cuda/blas.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ inline std::string cublasGetErrorString(cublasStatus_t status) {
3939
inline void throwIfError(
4040
cublasStatus_t err,
4141
const std::experimental::source_location& location = std::experimental::source_location::current()) {
42-
if (err != CUBLAS_STATUS_SUCCESS) ov::nvidia_gpu::throwIEException(cublasGetErrorString(err), location);
42+
if (err != CUBLAS_STATUS_SUCCESS) ov::nvidia_gpu::throw_ov_exception(cublasGetErrorString(err), location);
4343
}
4444

4545
inline void logIfError(

modules/nvidia_plugin/src/cuda/constant_factory.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ inline const constants::AnyNumeric& NumericConst(cudaDataType_t computeType) {
154154
return C<std::uint32_t>::value;
155155
}
156156
default:
157-
ov::nvidia_gpu::throwIEException(
157+
ov::nvidia_gpu::throw_ov_exception(
158158
fmt::format("The ngraph element type {} is not supported by "
159159
"the cuda library",
160160
computeType));

modules/nvidia_plugin/src/cuda/cuda_type_traits.hpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44

55
#pragma once
66

7-
#include <ngraph/type/element_type_traits.hpp>
7+
#include "openvino/core/type/element_type.hpp"
8+
89
#ifdef __CUDACC__
910
#include <cuda/float16.hpp>
1011
#endif

modules/nvidia_plugin/src/cuda/descriptor_utils.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
namespace CUDA {
1111

12-
DnnTensorDescriptor makeDnnTensorDescr(const ngraph::element::Type& type, const ngraph::Shape& shape) {
12+
DnnTensorDescriptor makeDnnTensorDescr(const ov::element::Type& type, const ov::Shape& shape) {
1313
OPENVINO_ASSERT(!shape.empty());
1414
OPENVINO_ASSERT(shape.size() <= CUDNN_DIM_MAX);
1515
std::vector<int> dims;

modules/nvidia_plugin/src/cuda/descriptor_utils.hpp

+3-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@
33
//
44

55
#include <cuda/dnn.hpp>
6-
#include <ngraph/node.hpp>
6+
7+
#include "openvino/core/node.hpp"
8+
#include "openvino/core/type/element_type.hpp"
79

810
namespace CUDA {
911

modules/nvidia_plugin/src/cuda/dnn.hpp

+1-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
#include <cudnn.h>
88

99
#include <functional>
10-
#include <ngraph/type/element_type.hpp>
1110
#include <optional>
1211

1312
#include "constant_factory.hpp"
@@ -39,7 +38,7 @@ inline std::string cudnnGetErrorString(cudnnConvolutionFwdAlgo_t algo) {
3938
inline void throwIfError(
4039
cudnnStatus_t err,
4140
const std::experimental::source_location& location = std::experimental::source_location::current()) {
42-
if (err != CUDNN_STATUS_SUCCESS) ov::nvidia_gpu::throwIEException(cudnnGetErrorString(err), location);
41+
if (err != CUDNN_STATUS_SUCCESS) ov::nvidia_gpu::throw_ov_exception(cudnnGetErrorString(err), location);
4342
}
4443

4544
inline void logIfError(

modules/nvidia_plugin/src/cuda/dnn_be.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -517,7 +517,7 @@ class DnnBEEngineConfigDescriptor : public DnnBackendDescriptor {
517517

518518
DnnBEEngine getEngine() const {
519519
auto engines = getBEDescAttributeValues<CUDNN_ATTR_ENGINECFG_ENGINE, DnnBEEngine>();
520-
if (engines.size() != 1) ov::nvidia_gpu::throwIEException("Unexpected number of cuDNN Backend engines");
520+
if (engines.size() != 1) ov::nvidia_gpu::throw_ov_exception("Unexpected number of cuDNN Backend engines");
521521
return std::move(*engines[0]);
522522
}
523523

modules/nvidia_plugin/src/cuda/graph.cpp

+2-4
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
//
44

55
#include "graph.hpp"
6-
#include <ie_common.h>
6+
#include "openvino/core/except.hpp"
77
#include <fmt/format.h>
88

99
namespace CUDA {
@@ -27,7 +27,6 @@ cudaGraph_t Graph::createNativeWithFlags(unsigned int flags) {
2727
return g;
2828
}
2929

30-
// clang-format off
3130
GraphExec::GraphExec(const Graph &g)
3231
#if !defined(NDEBUG) || defined(_DEBUG)
3332
try
@@ -43,10 +42,9 @@ Handle(cudaGraphInstantiate, cudaGraphExecDestroy, g.get(), static_cast<cudaGrap
4342
}
4443
#if !defined(NDEBUG) || defined(_DEBUG)
4544
catch (std::exception &e) {
46-
throw InferenceEngine::GeneralError { fmt::format("{}: {}", e.what(), errorMsg_) };
45+
OPENVINO_THROW(e.what(), ": ", errorMsg_);
4746
}
4847
#endif
49-
// clang-format on
5048

5149
cudaGraphExecUpdateResult GraphExec::update(const Graph &g) {
5250
cudaGraphExecUpdateResult res;

modules/nvidia_plugin/src/cuda/runtime.hpp

+2-2
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
inline void throwIfError(
1717
cudaError_t err,
1818
const std::experimental::source_location& location = std::experimental::source_location::current()) {
19-
if (err != cudaSuccess) ov::nvidia_gpu::throwIEException(cudaGetErrorString(err), location);
19+
if (err != cudaSuccess) ov::nvidia_gpu::throw_ov_exception(cudaGetErrorString(err), location);
2020
}
2121

2222
inline void logIfError(
@@ -116,7 +116,7 @@ inline int residentGrids(const cudaDeviceProp& p) {
116116
return defaultResidentGrids;
117117
}
118118

119-
inline int maxConcurrentStreams(CUDA::Device d) {
119+
inline int max_concurrent_streams(CUDA::Device d) {
120120
auto p = d.props();
121121
int r = p.asyncEngineCount;
122122
if (!p.concurrentKernels) return r + 1;

modules/nvidia_plugin/src/cuda/tensor.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
inline void throwIfError(
1212
cutensorStatus_t err,
1313
const std::experimental::source_location& location = std::experimental::source_location::current()) {
14-
if (err != CUTENSOR_STATUS_SUCCESS) ov::nvidia_gpu::throwIEException(cutensorGetErrorString(err), location);
14+
if (err != CUTENSOR_STATUS_SUCCESS) ov::nvidia_gpu::throw_ov_exception(cutensorGetErrorString(err), location);
1515
}
1616

1717
inline void logIfError(
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,63 @@
1-
// Copyright (C) 2018-2021 Intel Corporation
1+
// Copyright (C) 2018-2023 Intel Corporation
22
// SPDX-License-Identifier: Apache-2.0
33
//
4-
54
#include "cuda_async_infer_request.hpp"
6-
7-
#include <threading/ie_cpu_streams_executor.hpp>
8-
9-
#include "cuda_executable_network.hpp"
105
#include "cuda_itt.hpp"
116
#include "cuda_thread_pool.hpp"
127

138
namespace ov {
149
namespace nvidia_gpu {
1510

16-
CudaAsyncInferRequest::CudaAsyncInferRequest(const CudaInferRequest::Ptr& inferRequest,
17-
const InferenceEngine::ITaskExecutor::Ptr& cpuTaskExecutor,
18-
const InferenceEngine::ITaskExecutor::Ptr& waitExecutor,
19-
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor)
20-
: AsyncInferRequestThreadSafeDefault(inferRequest, cpuTaskExecutor, callbackExecutor), _inferRequest(inferRequest) {
11+
CudaAsyncInferRequest::CudaAsyncInferRequest(const CudaInferRequest::Ptr& request,
12+
const std::shared_ptr<ov::threading::ITaskExecutor>& task_executor,
13+
const std::shared_ptr<ov::threading::ITaskExecutor>& wait_executor,
14+
const std::shared_ptr<ov::threading::ITaskExecutor>& callback_executor)
15+
: ov::IAsyncInferRequest(request, task_executor, callback_executor),
16+
request_(request) {
2117
// In current implementation we have CPU only tasks and no needs in 2 executors
2218
// So, by default single stage pipeline is created.
2319
// This stage executes InferRequest::Infer() using cpuTaskExecutor.
2420
// But if remote asynchronous device is used the pipeline can by splitted tasks that are executed by cpuTaskExecutor
2521
// and waiting tasks. Waiting tasks can lock execution thread so they use separate threads from other executor.
2622
constexpr const auto remoteDevice = true;
2723

28-
auto cudaThreadPool = std::dynamic_pointer_cast<CudaThreadPool>(waitExecutor);
24+
auto cuda_thread_pool = std::dynamic_pointer_cast<CudaThreadPool>(wait_executor);
2925
if (remoteDevice) {
30-
_pipeline = {{cpuTaskExecutor,
26+
m_pipeline = {{task_executor,
3127
[this] {
32-
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::Preprocessing");
33-
_inferRequest->inferPreprocess();
28+
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::infer_preprocess");
29+
request_->infer_preprocess();
3430
}},
35-
{waitExecutor,
36-
[this, cudaThreadPool] {
37-
auto& threadContext = cudaThreadPool->GetThreadContext();
31+
{wait_executor,
32+
[this, cuda_thread_pool] {
33+
auto& threadContext = cuda_thread_pool->get_thread_context();
3834
{
39-
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::StartPipeline");
40-
_inferRequest->startPipeline(threadContext);
35+
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::start_pipeline");
36+
request_->start_pipeline(threadContext);
4137
}
4238
{
43-
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::WaitPipeline");
44-
_inferRequest->waitPipeline(threadContext);
39+
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::wait_pipeline");
40+
request_->wait_pipeline(threadContext);
4541
}
4642
}},
47-
{cpuTaskExecutor, [this] {
48-
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::Postprocessing");
49-
_inferRequest->inferPostprocess();
43+
{task_executor, [this] {
44+
OV_ITT_SCOPED_TASK(itt::domains::nvidia_gpu, "CudaAsyncInferRequest::infer_postprocess");
45+
request_->infer_postprocess();
5046
}}};
5147
}
5248
}
5349

54-
void CudaAsyncInferRequest::Cancel() {
55-
InferenceEngine::AsyncInferRequestThreadSafeDefault::Cancel();
56-
_inferRequest->Cancel();
50+
CudaAsyncInferRequest::~CudaAsyncInferRequest() {
51+
ov::IAsyncInferRequest::stop_and_wait();
5752
}
5853

59-
void CudaAsyncInferRequest::Infer_ThreadUnsafe() { StartAsync_ThreadUnsafe(); }
54+
void CudaAsyncInferRequest::cancel() {
55+
ov::IAsyncInferRequest::cancel();
56+
request_->cancel();
57+
}
6058

59+
void CudaAsyncInferRequest::infer_thread_unsafe() {
60+
start_async_thread_unsafe();
61+
}
6162
} // namespace nvidia_gpu
6263
} // namespace ov

modules/nvidia_plugin/src/cuda_async_infer_request.hpp

+13-17
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,30 @@
1-
// Copyright (C) 2018-2021 Intel Corporation
1+
// Copyright (C) 2018-2023 Intel Corporation
22
// SPDX-License-Identifier: Apache-2.0
33
//
44

55
#pragma once
66

7-
#include <cpp_interfaces/impl/ie_infer_async_request_thread_safe_default.hpp>
7+
#include "openvino/runtime/iasync_infer_request.hpp"
8+
#include "openvino/runtime/iinfer_request.hpp"
89

910
#include "cuda_infer_request.hpp"
1011

1112
namespace ov {
1213
namespace nvidia_gpu {
1314

14-
class CudaAsyncInferRequest : public InferenceEngine::AsyncInferRequestThreadSafeDefault {
15+
class CudaAsyncInferRequest : public ov::IAsyncInferRequest {
1516
public:
16-
CudaAsyncInferRequest(const CudaInferRequest::Ptr& inferRequest,
17-
const InferenceEngine::ITaskExecutor::Ptr& taskExecutor,
18-
const InferenceEngine::ITaskExecutor::Ptr& waitExecutor,
19-
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor);
20-
21-
/**
22-
* Cancel AsyncInferRequest
23-
*/
24-
void Cancel() override;
25-
/**
26-
* Overrides default behaviour and run request asynchronous
27-
*/
28-
void Infer_ThreadUnsafe() override;
17+
CudaAsyncInferRequest(const CudaInferRequest::Ptr& request,
18+
const std::shared_ptr<ov::threading::ITaskExecutor>& task_executor,
19+
const std::shared_ptr<ov::threading::ITaskExecutor>& wait_executor,
20+
const std::shared_ptr<ov::threading::ITaskExecutor>& callback_executor);
21+
22+
~CudaAsyncInferRequest();
23+
void cancel() override;
24+
void infer_thread_unsafe() override;
2925

3026
private:
31-
CudaInferRequest::Ptr _inferRequest;
27+
CudaInferRequest::Ptr request_;
3228
};
3329

3430
} // namespace nvidia_gpu

0 commit comments

Comments
 (0)