Skip to content

Commit 9110cb5

Browse files
authored
Merge branch 'master' into onednn_kv_cache_compression
2 parents ae61720 + d63a9df commit 9110cb5

File tree

151 files changed

+2288
-768
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

151 files changed

+2288
-768
lines changed

src/common/snippets/docs/debug_capabilities/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ Use the following cmake option to enable snippets debug capabilities:
99

1010
* [Performance counters](perf_count.md)
1111
* [Snippets segfault detector](snippets_segfault_detector.md)
12-
* [LIR passes serialization](LIR_passes_serialization.md)
12+
* [Linear IR passes serialization](linear_ir_passes_serialization.md)

src/common/snippets/docs/debug_capabilities/perf_count.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ Subgraph in snippets could be very large. Sometimes developers are interested th
55
There are two perf count modes.
66
- `Chrono` : Perf count via chrono call. This is a universal method, and support multi-threads scenario to print perf count data for each thread.
77
- `BackendSpecific` : Perf count provided by backend. This is for device specific requirement. For example, for sake of more light overhead and more accurate result, x86 or x86-64 CPU specific mode via reading RDTSC register is implemented. At current this x86 or x86-64 CPU BackendSpecific mode only support single thread.
8-
One can select prefered mode by setting `perf_count_mode` default value in [snippets Config](../../include/snippets/utils/debug_caps.hpp)
8+
One can select prefered mode by setting `perf_count_mode` default value in [snippets Config](../../include/snippets/utils/debug_caps_config.hpp)

src/common/snippets/docs/mha_optimization_guide.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ The heuristics for determining the optimal block sizes can be found in [BrgemmCP
129129

130130
### Blocking Order
131131

132-
The lowered pass [BrgemmBlocking](../../../plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/brgemm_blocking.cpp) performs blocking loops creation on LinearIR.
132+
The lowered pass [BrgemmBlocking](../../../common/snippets/src/lowered/pass/brgemm_blocking.cpp) performs blocking loops creation on LinearIR.
133133
Currently, the order of blocking loops is following (from outer to inner): `M->N->K`.
134134

135135
## MHA Performance Tuning Recommendations

src/common/snippets/docs/snippets_design_guide.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -638,23 +638,23 @@ Consequently, all the ports connected to the same `PortConnector` will have the
638638
In other words, when all the `Expressions` that required input data in a certain register are evaluated, the register may be reused to hold another `Expression's` output.
639639
`AssignRegisters` also supports two types of registers: general-purpose and vector ones.
640640
Different types of registers are managed and assigned independently, and a particular register type required by an `Expression` is provided by the `ov::snippets::Generator` (or a derived generator for target-specific `Ops`).
641-
2. `InsertTailLoop` injects tail-processing section after a loop body if needed.
641+
2. `InsertSpecificIterations` injects initialization section before a loop body and tail-processing section after a loop body if needed.
642642
Note that every loop has two parameters that specify how its body is evaluated: `work_amount` and `increment` The `work_amount` indicates how much of the data needs to be processed, it often equals to the dimension's size the loop is working on.
643643
The `increment` defines how many data entries are processed on every loop iteration (it usually equals to vector size for the innermost loops of elementwise subgraph).
644644
So if a loop's `work_amount` is not evenly divisible by its `increment`, it means that a tail processing is required.
645-
`InsertTailLoop` duplicates the body of such a loop, rescales pointer increments and load/store masks appropriately, and injects these `Ops` immediately after the processed loop.
645+
`InsertSpecificIterations` duplicates the body of such a loop, rescales pointer increments and load/store masks appropriately, and injects these `Ops` immediately after the processed loop.
646646
3. `CleanupLoopOffsets` "fuses" the finalization offsets of loop with an outer loop's pointer increments and zeroes the offsets before `Result` operations.
647647
4. `OptimizeLoopSingleEvaluation` moves all pointer arithmetic to finalization offsets in `LoopEnd`, and marks the loops that will be executed only once.
648648
This information will be used during code emission to eliminate redundant instructions.
649649

650-
Please see [assign_registers.cpp](../src/lowered/pass/assign_registers.cpp) and [insert_tail_loop.cpp](../src/lowered/pass/insert_tail_loop.cpp) for more info regarding the main passes in the `Preparation` stage.
650+
Please see [assign_registers.cpp](../src/lowered/pass/assign_registers.cpp) and [insert_specific_iterations.cpp](../src/lowered/pass/insert_specific_iterations.cpp) for more info regarding the main passes in the `Preparation` stage.
651651
When the `Preparation` is finished, the `Generator` constructs target-specific emitters by calling `init_emitter(target)` method for every `Expression` in the `LinearIR`, where the `target` is a `TargetMachine` instance.
652652

653653
The `TargetMachine` is a class that provides generator with target-specific information, such as supported instruction sets, vector register size etc.
654654
`TargetMachine` also maps the OpenVINO's `DiscreteTypeInfo` (stored in the `Expression`) to the emitter that actually implements the operation.
655655
The mapping is done using the `jitters` map defined in [target_machine.hpp](../include/snippets/target_machine.hpp).
656656
In order for this mechanism to work, every `Snippets'` code generation backend should create emitter implementations derived from the `Emitter` base class defined in [emitter.hpp](../include/snippets/emitter.hpp).
657-
The backend then should create its own target machine class (derived from the common `TargetMachine`) and populate the `jitters` map, see the [cpu_generator.cpp](../../../plugins/intel_cpu/src/emitters/x64/cpu_generator.cpp) for an implementation example.
657+
The backend then should create its own target machine class (derived from the common `TargetMachine`) and populate the `jitters` map, see the [cpu_generator.cpp](../../../plugins/intel_cpu/src/emitters/snippets/x64/cpu_generator.cpp) for an implementation example.
658658

659659
Note that `init_emitters(...)` only initializes the appropriate emitters, but do not actually emit any code.
660660
To perform code emission, a `snippets::op::Kernel` operation is constructed (see [generator.cpp](../src/generator.cpp)), its constructor takes the `IR` with all the initialized emitters as an only input argument.
@@ -663,7 +663,7 @@ Finally, the `kernel->emit_code({}, {})` command initiates the code emission.
663663
Note that the `emit_code(...)` is called only for the `KernelEmitter`, and the emitter is responsible for calling the same method for the rest of the expressions in the `IR` This encapsulation is needed because the `KernelEmitter` performs mapping of the assigned abstract registers to physical registers available on a particular platform.
664664
Another important function of the `KernelEmitter` is to calculate input/output data offsets based on dimension indices provided in runtime, and to shift corresponding data-handling registers accordingly.
665665
Keep in mind however, that the required functionality of the `KernelEmitter` depends on how the rest of the emitters are implemented (particularly for `Load`/`Store` `Ops`).
666-
We've discussed above how the emitters for the `intel_cpu` plugin are implemented (see [jit_snippets_emitters.cpp](../../../plugins/intel_cpu/src/emitters/x64/jit_snippets_emitters.cpp) for more details), but a different backend might require a different approach depending on hardware specifics.
666+
We've discussed above how the emitters for the `intel_cpu` plugin are implemented (see [jit_snippets_emitters.cpp](../../../plugins/intel_cpu/src/emitters/snippets/x64/jit_snippets_emitters.cpp) for more details), but a different backend might require a different approach depending on hardware specifics.
667667

668668
## See also
669669

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
// Copyright (C) 2018-2024 Intel Corporation
2+
// SPDX-License-Identifier: Apache-2.0
3+
//
4+
5+
#pragma once
6+
/**
7+
* @brief Define a separate value for every version of C++ standard upto currently supported by build setup.
8+
*/
9+
#if !(defined(_MSC_VER) && __cplusplus == 199711L)
10+
# if __cplusplus >= 201103L
11+
# define OPENVINO_CPP_VER_AT_LEAST_11
12+
# if __cplusplus >= 201402L
13+
# define OPENVINO_CPP_VER_AT_LEAST_14
14+
# if __cplusplus >= 201703L
15+
# define OPENVINO_CPP_VER_AT_LEAST_17
16+
# if __cplusplus >= 202002L
17+
# define OPENVINO_CPP_VER_AT_LEAST_20
18+
# endif
19+
# endif
20+
# endif
21+
# endif
22+
#elif defined(_MSC_VER) && __cplusplus == 199711L
23+
# if _MSVC_LANG >= 201103L
24+
# define OPENVINO_CPP_VER_AT_LEAST_11
25+
# if _MSVC_LANG >= 201402L
26+
# define OPENVINO_CPP_VER_AT_LEAST_14
27+
# if _MSVC_LANG >= 201703L
28+
# define OPENVINO_CPP_VER_AT_LEAST_17
29+
# if _MSVC_LANG >= 202002L
30+
# define OPENVINO_CPP_VER_AT_LEAST_20
31+
# endif
32+
# endif
33+
# endif
34+
# endif
35+
#endif
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
// Copyright (C) 2018-2024 Intel Corporation
2+
// SPDX-License-Identifier: Apache-2.0
3+
//
4+
5+
#pragma once
6+
7+
#include <cstdio>
8+
9+
#include "openvino/util/filesystem.hpp"
10+
namespace ov {
11+
namespace util {
12+
13+
#if defined(OPENVINO_HAS_FILESYSTEM)
14+
using Path = std::filesystem::path;
15+
#elif defined(OPENVINO_HAS_EXP_FILESYSTEM)
16+
// Known issues:
17+
// * error C2280: 'std::u32string std::experimental::filesystem::v1::path::u32string(void) const': attempting to
18+
// * filesystem error: Cannot convert character sequence: Invalid in or incomplete multibyte or wide character
19+
20+
///
21+
/// @typedef Path
22+
/// @brief Alias for std::experimental::filesystem::path.
23+
///
24+
/// This alias is used to simplify the usage of filesystem paths.
25+
///
26+
/// @note The experimental version of std::filesystem::path may not support all features correctly.
27+
/// It is recommended to use this alias with caution and consider upgrading to C++17 or higher
28+
/// for full support of std::filesystem::path.
29+
///
30+
using Path = std::experimental::filesystem::path;
31+
#endif
32+
33+
} // namespace util
34+
} // namespace ov

src/common/util/include/openvino/util/filesystem.hpp

+9-9
Original file line numberDiff line numberDiff line change
@@ -4,28 +4,28 @@
44

55
#pragma once
66

7-
#include "openvino/core/visibility.hpp"
7+
#include "openvino/util/cpp_version.hpp"
88

9-
#if defined(_MSC_VER) && defined(OPENVINO_CPP_VER_11)
9+
#if defined(_MSC_VER) && defined(OPENVINO_CPP_VER_AT_LEAST_17)
10+
# define OPENVINO_HAS_FILESYSTEM
11+
#elif defined(_MSC_VER) && defined(OPENVINO_CPP_VER_AT_LEAST_11)
1012
# define OPENVINO_HAS_EXP_FILESYSTEM
1113
# define _SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING
1214
# define _LIBCPP_NO_EXPERIMENTAL_DEPRECATION_WARNING_FILESYSTEM
1315
#elif defined(__has_include)
14-
# if defined(OPENVINO_CPP_VER_17) && (__has_include(<filesystem>)) && (!__has_include(<experimental/filesystem>))
16+
# if defined(OPENVINO_CPP_VER_AT_LEAST_17) && (__has_include(<filesystem>))
1517
# define OPENVINO_HAS_FILESYSTEM
16-
# elif defined(OPENVINO_CPP_VER_11) && (__has_include(<experimental/filesystem>))
18+
# elif defined(OPENVINO_CPP_VER_AT_LEAST_11) && (__has_include(<experimental/filesystem>))
1719
# define OPENVINO_HAS_EXP_FILESYSTEM
1820
# define _SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING
1921
# define _LIBCPP_NO_EXPERIMENTAL_DEPRECATION_WARNING_FILESYSTEM
2022
# endif
2123
#endif
2224

23-
#if !defined(OPENVINO_HAS_FILESYSTEM) && !defined(OPENVINO_HAS_EXP_FILESYSTEM)
24-
# error "Neither #include <filesystem> nor #include <experimental/filesystem> is available."
25-
#elif defined(OPENVINO_HAS_FILESYSTEM)
25+
#if defined(OPENVINO_HAS_FILESYSTEM)
2626
# include <filesystem>
27-
namespace std_fs = std::filesystem;
2827
#elif defined(OPENVINO_HAS_EXP_FILESYSTEM)
2928
# include <experimental/filesystem>
30-
namespace std_fs = std::experimental::filesystem;
29+
#else
30+
# error "Neither #include <filesystem> nor #include <experimental/filesystem> is available."
3131
#endif

src/core/include/openvino/core/graph_util.hpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
#include "openvino/op/parameter.hpp"
2222
#include "openvino/pass/serialize.hpp"
2323

24-
#ifdef OPENVINO_CPP_VER_17
24+
#ifdef OPENVINO_CPP_VER_AT_LEAST_17
2525
# include <filesystem>
2626
#endif
2727

@@ -299,7 +299,7 @@ void serialize(const std::shared_ptr<const ov::Model>& m,
299299
const std::string& bin_path = "",
300300
ov::pass::Serialize::Version version = ov::pass::Serialize::Version::UNSPECIFIED);
301301

302-
#ifdef OPENVINO_CPP_VER_17
302+
#ifdef OPENVINO_CPP_VER_AT_LEAST_17
303303
template <class Path, std::enable_if_t<std::is_same_v<Path, std::filesystem::path>>* = nullptr>
304304
void serialize(const std::shared_ptr<const ov::Model>& m,
305305
const Path& xml_path,
@@ -327,7 +327,7 @@ void save_model(const std::shared_ptr<const ov::Model>& model,
327327
bool compress_to_fp16 = true);
328328
#endif
329329

330-
#ifdef OPENVINO_CPP_VER_17
330+
#ifdef OPENVINO_CPP_VER_AT_LEAST_17
331331
template <class Path, std::enable_if_t<std::is_same_v<Path, std::filesystem::path>>* = nullptr>
332332
void save_model(const std::shared_ptr<const ov::Model>& model, const Path& output_model, bool compress_to_fp16 = true) {
333333
save_model(model, output_model.string(), compress_to_fp16);

src/core/include/openvino/core/visibility.hpp

+8-8
Original file line numberDiff line numberDiff line change
@@ -80,26 +80,26 @@
8080

8181
#if !(defined(_MSC_VER) && __cplusplus == 199711L)
8282
# if __cplusplus >= 201103L
83-
# define OPENVINO_CPP_VER_11
83+
# define OPENVINO_CPP_VER_AT_LEAST_11
8484
# if __cplusplus >= 201402L
85-
# define OPENVINO_CPP_VER_14
85+
# define OPENVINO_CPP_VER_AT_LEAST_14
8686
# if __cplusplus >= 201703L
87-
# define OPENVINO_CPP_VER_17
87+
# define OPENVINO_CPP_VER_AT_LEAST_17
8888
# if __cplusplus >= 202002L
89-
# define OPENVINO_CPP_VER_20
89+
# define OPENVINO_CPP_VER_AT_LEAST_20
9090
# endif
9191
# endif
9292
# endif
9393
# endif
9494
#elif defined(_MSC_VER) && __cplusplus == 199711L
9595
# if _MSVC_LANG >= 201103L
96-
# define OPENVINO_CPP_VER_11
96+
# define OPENVINO_CPP_VER_AT_LEAST_11
9797
# if _MSVC_LANG >= 201402L
98-
# define OPENVINO_CPP_VER_14
98+
# define OPENVINO_CPP_VER_AT_LEAST_14
9999
# if _MSVC_LANG >= 201703L
100-
# define OPENVINO_CPP_VER_17
100+
# define OPENVINO_CPP_VER_AT_LEAST_17
101101
# if _MSVC_LANG >= 202002L
102-
# define OPENVINO_CPP_VER_20
102+
# define OPENVINO_CPP_VER_AT_LEAST_20
103103
# endif
104104
# endif
105105
# endif

src/core/include/openvino/pass/serialize.hpp

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
#include "openvino/opsets/opset.hpp"
1212
#include "openvino/pass/pass.hpp"
1313

14-
#ifdef OPENVINO_CPP_VER_17
14+
#ifdef OPENVINO_CPP_VER_AT_LEAST_17
1515
# include <filesystem>
1616
#endif
1717

@@ -39,7 +39,7 @@ class OPENVINO_API Serialize : public ov::pass::ModelPass {
3939

4040
Serialize(const std::string& xmlPath, const std::string& binPath, Version version = Version::UNSPECIFIED);
4141

42-
#ifdef OPENVINO_CPP_VER_17
42+
#ifdef OPENVINO_CPP_VER_AT_LEAST_17
4343
Serialize(const std::filesystem::path& xmlPath,
4444
const std::filesystem::path& binPath,
4545
Version version = Version::UNSPECIFIED)

src/core/src/op/paged_attention.cpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,8 @@ void PagedAttentionExtension::validate_and_infer_types() {
177177
get_input_partial_shape(15).rank().get_length(),
178178
".");
179179
NODE_VALIDATION_CHECK(this,
180-
get_input_element_type(15).is_dynamic() || get_input_element_type(15) == element::f32,
180+
get_input_element_type(15).is_dynamic() || get_input_element_type(15) == element::f32 ||
181+
get_input_element_type(15) == element::f16,
181182
"Element type of `rotation_trig_lut` input should be f32, but it is ",
182183
get_input_element_type(15),
183184
".");

0 commit comments

Comments
 (0)