Skip to content

Commit 4209601

Browse files
eaidovaAlexKoff88
andauthored
move llm_bench and wwb to tools (openvinotoolkit#967)
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com> Co-authored-by: Alexander Kozlov <kozzzloff@list.ru>
1 parent 80221da commit 4209601

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+203
-290
lines changed

.github/label_config.yml

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# https://github.com/actions/labeler
22

3-
# Add label to the PRs changing files under llm_bench/
3+
# Add label to the PRs changing files under tools/llm_bench/
44
llm_bench:
55
- changed-files:
66
- any-glob-to-any-file:
7-
- 'llm_bench/**'
7+
- 'tools/llm_bench/**'
88
- '.github/workflows/llm_bench-python.yml'
9-
9+
# Add label to the PRs changing files under tools/who_what_benchmark/
1010
WWB:
1111
- changed-files:
1212
- any-glob-to-any-file:
13-
- 'llm_bench/python/who_what_benchmark/**'
13+
- 'tools/who_what_benchmark/**'

.github/workflows/labeler.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ on:
55
pull_request_target:
66
types: [opened, edited, synchronize]
77
paths:
8-
- llm_bench/python/**
8+
- tools/llm_bench/**
99
- .github/workflows/llm_bench-python.yml
1010

1111
permissions: read-all # Required by https://github.com/ossf/scorecard/blob/e23b8ad91fd6a64a0a971ca4fc0a4d1650725615/docs/checks.md#token-permissions

.github/workflows/llm_bench-python.yml

+15-13
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,19 @@
44
name: llm_bench Python Test
55

66
env:
7-
LLM_BENCH_PYPATH: llm_bench/python
8-
WWB_PATH: llm_bench/python/who_what_benchmark
7+
LLM_BENCH_PYPATH: tools/llm_bench
8+
WWB_PATH: tools/who_what_benchmark
99

1010
on:
1111
push:
1212
branches: [ "master" ]
1313
paths:
14-
- llm_bench/python/**
14+
- tools/llm_bench/**
15+
- tools/who_what_benchmark/**
1516
pull_request:
1617
paths:
17-
- llm_bench/python/**
18+
- tools/llm_bench/**
19+
- tools/who_what_benchmark/**
1820
- .github/workflows/llm_bench-python.yml
1921

2022
permissions: read-all # Required by https://github.com/ossf/scorecard/blob/e23b8ad91fd6a64a0a971ca4fc0a4d1650725615/docs/checks.md#token-permissions
@@ -59,22 +61,22 @@ jobs:
5961
run: |
6062
export GIT_LFS_SKIP_SMUDGE=0
6163
git clone --depth 1 https://huggingface.co/katuni4ka/tiny-random-qwen
62-
python ./llm_bench/python/benchmark.py -m tiny-random-qwen -d cpu -n 1 -f pt
64+
python ./tools/llm_bench/benchmark.py -m tiny-random-qwen -d cpu -n 1 -f pt
6365
- name: Test tiny-random-baichuan2 on Linux
6466
run: |
6567
optimum-cli export openvino --model katuni4ka/tiny-random-baichuan2 --trust-remote-code --weight-format fp16 ./ov_models/tiny-random-baichuan2/pytorch/dldt/FP16
66-
python ./llm_bench/python/benchmark.py -m ./ov_models/tiny-random-baichuan2/pytorch/dldt/FP16/ -d cpu -n 1
68+
python ./tools/llm_bench/benchmark.py -m ./ov_models/tiny-random-baichuan2/pytorch/dldt/FP16/ -d cpu -n 1
6769
- name: Test tiny-stable-diffusion on Linux
6870
run: |
6971
optimum-cli export openvino --model segmind/tiny-sd --trust-remote-code --weight-format fp16 ./ov_models/tiny-sd/pytorch/dldt/FP16/
70-
python ./llm_bench/python/benchmark.py -m ./ov_models/tiny-sd/pytorch/dldt/FP16/ -pf ./llm_bench/python/prompts/stable-diffusion.jsonl -d cpu -n 1
72+
python ./tools/llm_bench/benchmark.py -m ./ov_models/tiny-sd/pytorch/dldt/FP16/ -pf ./tools/llm_bench/prompts/stable-diffusion.jsonl -d cpu -n 1
7173
- name: WWB Tests
7274
run: |
7375
GIT_CLONE_PROTECTION_ACTIVE=false pip install -r ${{ env.WWB_PATH }}/requirements.txt
7476
pip install git+https://github.com/huggingface/optimum.git
7577
GIT_CLONE_PROTECTION_ACTIVE=false pip install ${{ env.WWB_PATH }}
7678
python -m pip install -U --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly --force-reinstall
77-
python -m pytest llm_bench/python/who_what_benchmark/tests
79+
python -m pytest tools/who_what_benchmark/tests
7880
stateful:
7981
runs-on: ubuntu-20.04
8082
steps:
@@ -84,16 +86,16 @@ jobs:
8486
python-version: "3.10"
8587
- name: Test stateful
8688
run: |
87-
GIT_CLONE_PROTECTION_ACTIVE=false python -m pip install -r llm_bench/python/requirements.txt
89+
GIT_CLONE_PROTECTION_ACTIVE=false python -m pip install -r tools/llm_bench/requirements.txt
8890
python -m pip uninstall --yes openvino
8991
python -m pip install -U --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
90-
python llm_bench/python/convert.py --model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --output_dir . --stateful
92+
python tools/llm_bench/convert.py --model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --output_dir . --stateful
9193
grep beam_idx pytorch/dldt/FP32/openvino_model.xml
9294
- name: WWB Tests
9395
run: |
94-
GIT_CLONE_PROTECTION_ACTIVE=false pip install -r llm_bench/python/who_what_benchmark/requirements.txt
96+
GIT_CLONE_PROTECTION_ACTIVE=false pip install -r tools/who_what_benchmark/requirements.txt
9597
pip install git+https://github.com/huggingface/optimum.git
96-
GIT_CLONE_PROTECTION_ACTIVE=false pip install llm_bench/python/who_what_benchmark/
98+
GIT_CLONE_PROTECTION_ACTIVE=false pip install tools/who_what_benchmark/
9799
pip install pytest
98100
python -m pip install -U --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly --force-reinstall
99-
python -m pytest llm_bench/python/who_what_benchmark/tests
101+
python -m pytest tools/who_what_benchmark/tests

bandit.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ any_other_function_with_shell_equals_true:
131131
- subprocess.check_output
132132
- subprocess.run
133133
assert_used:
134-
skips: ["llm_bench/python/who_what_benchmark/tests/test_*.py"]
134+
skips: ["tools/who_what_benchmark/tests/test_*.py"]
135135
hardcoded_tmp_directory:
136136
tmp_dirs:
137137
- /tmp

llm_bench/python/README.md

100755100644
+2-171
Original file line numberDiff line numberDiff line change
@@ -1,173 +1,4 @@
11
# Benchmarking Script for Large Language Models
22

3-
This script provides a unified approach to estimate performance for Large Language Models (LLMs). It leverages pipelines provided by Optimum-Intel and allows performance estimation for PyTorch and OpenVINO models using nearly identical code and pre-collected models.
4-
5-
6-
### 1. Prepare Python Virtual Environment for LLM Benchmarking
7-
8-
``` bash
9-
python3 -m venv ov-llm-bench-env
10-
source ov-llm-bench-env/bin/activate
11-
pip install --upgrade pip
12-
13-
git clone https://github.com/openvinotoolkit/openvino.genai.git
14-
cd openvino.genai/llm_bench/python/
15-
pip install -r requirements.txt
16-
```
17-
18-
> Note:
19-
> For existing Python environments, run the following command to ensure that all dependencies are installed with the latest versions:
20-
> `pip install -U --upgrade-strategy eager -r requirements.txt`
21-
22-
#### (Optional) Hugging Face Login :
23-
24-
Login to Hugging Face if you want to use non-public models:
25-
26-
```bash
27-
huggingface-cli login
28-
```
29-
30-
### 2. Convert Model to OpenVINO IR Format
31-
32-
The `optimum-cli` tool simplifies converting Hugging Face models to OpenVINO IR format.
33-
- Detailed documentation can be found in the [Optimum-Intel documentation](https://huggingface.co/docs/optimum/main/en/intel/openvino/export).
34-
- To learn more about weight compression, see the [NNCF Weight Compression Guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
35-
- For additional guidance on running inference with OpenVINO for LLMs, see the [OpenVINO LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
36-
37-
**Usage:**
38-
39-
```bash
40-
optimum-cli export openvino --model <MODEL_ID> --weight-format <PRECISION> <OUTPUT_DIR>
41-
42-
optimum-cli export openvino -h # For detailed information
43-
```
44-
45-
* `--model <MODEL_ID>` : model_id for downloading from [huggngface_hub](https://huggingface.co/models) or path with directory where pytorch model located.
46-
* `--weight-format <PRECISION>` : precision for model conversion. Available options: `fp32, fp16, int8, int4, mxfp4`
47-
* `<OUTPUT_DIR>`: output directory for saving generated OpenVINO model.
48-
49-
**NOTE:**
50-
- Models larger than 1 billion parameters are exported to the OpenVINO format with 8-bit weights by default. You can disable it with `--weight-format fp32`.
51-
52-
**Example:**
53-
```bash
54-
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format fp16 models/llama-2-7b-chat
55-
```
56-
**Resulting file structure:**
57-
58-
```console
59-
models
60-
└── llama-2-7b-chat
61-
├── config.json
62-
├── generation_config.json
63-
├── openvino_detokenizer.bin
64-
├── openvino_detokenizer.xml
65-
├── openvino_model.bin
66-
├── openvino_model.xml
67-
├── openvino_tokenizer.bin
68-
├── openvino_tokenizer.xml
69-
├── special_tokens_map.json
70-
├── tokenizer_config.json
71-
├── tokenizer.json
72-
└── tokenizer.model
73-
```
74-
75-
### 3. Benchmark LLM Model
76-
77-
To benchmark the performance of the LLM, use the following command:
78-
79-
``` bash
80-
python benchmark.py -m <model> -d <device> -r <report_csv> -f <framework> -p <prompt text> -n <num_iters>
81-
# e.g.
82-
python benchmark.py -m models/llama-2-7b-chat/ -n 2
83-
python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2
84-
python benchmark.py -m models/llama-2-7b-chat/ -pf prompts/llama-2-7b-chat_l.jsonl -n 2
85-
```
86-
87-
**Parameters:**
88-
- `-m`: Path to the model.
89-
- `-d`: Inference device (default: CPU).
90-
- `-r`: Path to the CSV report.
91-
- `-f`: Framework (default: ov).
92-
- `-p`: Interactive prompt text.
93-
- `-pf`: Path to a JSONL file containing prompts.
94-
- `-n`: Number of iterations (default: 0, the first iteration is excluded).
95-
- `-ic`: Limit the output token size (default: 512) for text generation and code generation models.
96-
97-
**Additional options:**
98-
``` bash
99-
python ./benchmark.py -h # for more information
100-
```
101-
102-
#### Benchmarking the Original PyTorch Model:
103-
To benchmark the original PyTorch model, first download the model locally and then run benchmark by specifying PyTorch as the framework with parameter `-f pt`
104-
105-
```bash
106-
# Download PyTorch Model
107-
huggingface-cli download meta-llama/Llama-2-7b-chat-hf --local-dir models/llama-2-7b-chat/pytorch
108-
# Benchmark with PyTorch Framework
109-
python benchmark.py -m models/llama-2-7b-chat/pytorch -n 2 -f pt
110-
```
111-
112-
> **Note:** If needed, You can install a specific OpenVINO version using pip:
113-
> ``` bash
114-
> # e.g.
115-
> pip install openvino==2024.4.0
116-
> # Optional, install the openvino nightly package if needed.
117-
> # OpenVINO nightly is pre-release software and has not undergone full release validation or qualification.
118-
> pip uninstall openvino
119-
> pip install --upgrade --pre openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
120-
> ```
121-
122-
## 4. Benchmark LLM with `torch.compile()`
123-
124-
The `--torch_compile_backend` option enables you to use `torch.compile()` to accelerate PyTorch models by compiling them into optimized kernels using a specified backend.
125-
126-
Before benchmarking, you need to download the original PyTorch model. Use the following command to download the model locally:
127-
128-
```bash
129-
huggingface-cli download meta-llama/Llama-2-7b-chat-hf --local-dir models/llama-2-7b-chat/pytorch
130-
```
131-
132-
To run the benchmarking script with `torch.compile()`, use the `--torch_compile_backend` option to specify the backend. You can choose between `pytorch` or `openvino` (default). Example:
133-
134-
```bash
135-
python ./benchmark.py -m models/llama-2-7b-chat/pytorch -d CPU --torch_compile_backend openvino
136-
```
137-
138-
> **Note:** To use `torch.compile()` with CUDA GPUs, you need to install the nightly version of PyTorch:
139-
>
140-
> ```bash
141-
> pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
142-
> ```
143-
144-
145-
## 5. Running on 2-Socket Platforms
146-
147-
The benchmarking script sets `openvino.properties.streams.num(1)` by default. For multi-socket platforms, use `numactl` on Linux or the `--load_config` option to modify behavior.
148-
149-
| OpenVINO Version | Behaviors |
150-
|:--------------------|:------------------------------------------------|
151-
| Before 2024.0.0 | streams.num(1) <br>execute on 2 sockets. |
152-
| 2024.0.0 | streams.num(1) <br>execute on the same socket as the APP is running on. |
153-
154-
For example, `--load_config config.json` as following will result in streams.num(1) and execute on 2 sockets.
155-
```json
156-
{
157-
"INFERENCE_NUM_THREADS": <NUMBER>
158-
}
159-
```
160-
`<NUMBER>` is the number of total physical cores in 2 sockets.
161-
162-
## 6. Execution on CPU device
163-
164-
OpenVINO is by default bult with [oneTBB](https://github.com/oneapi-src/oneTBB/) threading library, while Torch uses [OpenMP](https://www.openmp.org/). Both threading libraries have ['busy-wait spin'](https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSPINCOUNT.html) by default. When running LLM pipeline on CPU device, there is threading overhead in the switching between inference on CPU with OpenVINO (oneTBB) and postprocessing (For example: greedy search or beam search) with Torch (OpenMP).
165-
166-
**Alternative solutions**
167-
1. Use --genai option which uses OpenVINO genai API instead of optimum-intel API. In this case postprocessing is executed with OpenVINO genai API.
168-
2. Without --genai option which uses optimum-intel API, set environment variable [OMP_WAIT_POLICY](https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fWAIT_005fPOLICY.html) to PASSIVE which will disable OpenMP 'busy-wait', and benchmark.py will also limit the Torch thread number to avoid using CPU cores which is in 'busy-wait' by OpenVINO inference.
169-
170-
## 7. Additional Resources
171-
172-
- **Error Troubleshooting:** Check the [NOTES.md](./doc/NOTES.md) for solutions to known issues.
173-
- **Image Generation Configuration:** Refer to [IMAGE_GEN.md](./doc/IMAGE_GEN.md) for setting parameters for image generation models.
3+
> [!IMPORTANT]
4+
> LLM bench code was moved to [tools](../../tools/llm_bench/) directory. Please navigate to the new directory for continue of tool usage.

llm_bench/python/doc/IMAGE_GEN.md

-23
This file was deleted.

llm_bench/python/doc/NOTES.md

-74
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Simple Accuracy Benchmark for Generative AI models
2+
3+
> [!IMPORTANT]
4+
> Who What Benchmark code was moved to [tools](../../../tools/who_what_benchmark/) directory. Please navigate to the new directory for continue of tool usage.

0 commit comments

Comments
 (0)