Skip to content

Commit c10cb6d

Browse files
Threading clarification
1 parent 83e9f07 commit c10cb6d

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

llm_bench/python/README.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -138,4 +138,10 @@ For example, --load_config config.json as following in OpenVINO 2024.0.0 will re
138138
> If you encounter any errors, please check **[NOTES.md](./doc/NOTES.md)** which provides solutions to the known errors.
139139
### 2. Image generation
140140
> To configure more parameters for image generation models, reference to **[IMAGE_GEN.md](./doc/IMAGE_GEN.md)**
141-
### 3. Threading
141+
### 3. CPU Threading
142+
143+
OpenVINO uses [oneTBB](https://github.com/oneapi-src/oneTBB/) as default threading library, while Torch uses [OpenMP](https://www.openmp.org/). Both threading libraries have ['busy-wait spin'](https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSPINCOUNT.html) by default. So in LLM pipeline, when inference on CPU with OpenVINO and postprocessing with Torch(For example: greedy search or beam search), there is threading overhead in the switching between inference(OpenVINO with oneTBB) and postprocessing (Torch with OpenMP).
144+
145+
**Alternative solutions**
146+
1. Use --genai option which uses OpenVINO genai APIs instead of optimum-intel APIs and executes postprocessing with OpenVINO genai APIs.
147+
2. Without --genai option which uses optimum-intel APIs, set environment variable [OMP_WAIT_POLICY](https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fWAIT_005fPOLICY.html) to PASSIVE which will disable OpenMP 'busy-wait', and benchmark.py will also limit the Torch thread number to avoid using CPU cores which is in 'busy-wait' by OpenVINO inference.

0 commit comments

Comments
 (0)