Skip to content

Commit e7bc52c

Browse files
Merge pull request #1 from ravi9/convert_deprecate
Updated Readme for LLM_Bench
2 parents 6978732 + 530c08b commit e7bc52c

File tree

1 file changed

+114
-89
lines changed

1 file changed

+114
-89
lines changed

llm_bench/python/README.md

+114-89
Original file line numberDiff line numberDiff line change
@@ -1,140 +1,165 @@
1-
# Benchmarking script for large language models
1+
# Benchmarking Script for Large Language Models
22

3-
This script provides a unified approach to estimate performance for Large Language Models.
4-
It is based on pipelines provided by Optimum-Intel and allows to estimate performance for
5-
pytorch and openvino models, using almost the same code and precollected models.
3+
This script provides a unified approach to estimate performance for Large Language Models (LLMs). It leverages pipelines provided by Optimum-Intel and allows performance estimation for PyTorch and OpenVINO models using nearly identical code and pre-collected models.
64

7-
## Usage
85

9-
### 1. Start a Python virtual environment
6+
### 1. Prepare Python Virtual Environment for LLM Benchmarking
107

118
``` bash
12-
python3 -m venv python-env
13-
source python-env/bin/activate
9+
python3 -m venv ov-llm-bench-env
10+
source ov-llm-bench-env/bin/activate
1411
pip install --upgrade pip
15-
pip install -r requirements.txt
12+
13+
git clone https://github.com/openvinotoolkit/openvino.genai.git
14+
cd openvino.genai/llm_bench/python/
15+
pip install -r requirements.txt
1616
```
17-
> Note:
18-
> If you are using an existing python environment, recommend following command to use all the dependencies with latest versions:
19-
> pip install -U --upgrade-strategy eager -r requirements.txt
2017

21-
### 2. Convert a model to OpenVINO IR
22-
23-
The optimum-cli tool allows you to convert models from Hugging Face to the OpenVINO IR format. More detailed info about tool usage can be found in [Optimum Intel documentation](https://huggingface.co/docs/optimum/main/en/intel/openvino/export)
18+
> Note:
19+
> For existing Python environments, run the following command to ensure that all dependencies are installed with the latest versions:
20+
> `pip install -U --upgrade-strategy eager -r requirements.txt`
2421
25-
Prerequisites:
26-
install conversion dependencies using `requirements.txt`
22+
#### (Optional) Hugging Face Login :
2723

28-
Usage:
24+
Login to Hugging Face if you want to use non-public models:
2925

3026
```bash
31-
optimum-cli export openvino --model <MODEL_NAME> --weight-format <PRECISION> <OUTPUT_DIR>
27+
huggingface-cli login
3228
```
3329

34-
Paramters:
35-
* `--model <MODEL_NAME>` - <MODEL_NAME> model_id for downloading from huggngface_hub (https://huggingface.co/models) or path with directory where pytorch model located.
36-
* `--weight-format` - precision for model conversion fp32, fp16, int8, int4
37-
* `<OUTPUT_DIR>` - output directory for saving OpenVINO model.
30+
### 2. Convert Model to OpenVINO IR Format
31+
32+
The `optimum-cli` tool simplifies converting Hugging Face models to OpenVINO IR format.
33+
- Detailed documentation can be found in the [Optimum-Intel documentation](https://huggingface.co/docs/optimum/main/en/intel/openvino/export).
34+
- To learn more about weight compression, see the [NNCF Weight Compression Guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
35+
- For additional guidance on running inference with OpenVINO for LLMs, see the [OpenVINO LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
3836

39-
Usage example:
40-
```bash
41-
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format fp16 models/llama-2-7b-chat
42-
```
37+
**Usage:**
4338

44-
the result of running the command will have the following file structure:
39+
```bash
40+
optimum-cli export openvino --model <MODEL_ID> --weight-format <PRECISION> <OUTPUT_DIR>
4541

46-
|-llama-2-7b-chat
47-
|-pytorch
48-
|-dldt
49-
|-FP16
50-
|-openvino_model.xml
51-
|-openvino_model.bin
52-
|-config.json
53-
|-generation_config.json
54-
|-tokenizer_config.json
55-
|-tokenizer.json
56-
|-tokenizer.model
57-
|-special_tokens_map.json
42+
optimum-cli export openvino -h # For detailed information
43+
```
5844

59-
### 3. Benchmarking
45+
* `--model <MODEL_ID>` : model_id for downloading from [huggngface_hub](https://huggingface.co/models) or path with directory where pytorch model located.
46+
* `--weight-format <PRECISION>` : precision for model conversion. Available options: `fp32, fp16, int8, int4, mxfp4`
47+
* `<OUTPUT_DIR>`: output directory for saving generated OpenVINO model.
6048

61-
Prerequisites:
62-
install benchmarking dependencies using `requirements.txt`
49+
**NOTE:**
50+
- Models larger than 1 billion parameters are exported to the OpenVINO format with 8-bit weights by default. You can disable it with `--weight-format fp32`.
6351

64-
``` bash
65-
pip install -r requirements.txt
52+
**Example:**
53+
```bash
54+
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format fp16 models/llama-2-7b-chat
6655
```
67-
note: **You can specify the installed OpenVINO version through pip install**
68-
``` bash
69-
# e.g.
70-
pip install openvino==2023.3.0
56+
**Resulting file structure:**
57+
58+
```console
59+
models
60+
└── llama-2-7b-chat
61+
├── config.json
62+
├── generation_config.json
63+
├── openvino_detokenizer.bin
64+
├── openvino_detokenizer.xml
65+
├── openvino_model.bin
66+
├── openvino_model.xml
67+
├── openvino_tokenizer.bin
68+
├── openvino_tokenizer.xml
69+
├── special_tokens_map.json
70+
├── tokenizer_config.json
71+
├── tokenizer.json
72+
└── tokenizer.model
7173
```
7274

73-
### 4. Run the following command to test the performance of one LLM model
75+
### 3. Benchmark LLM Model
76+
77+
To benchmark the performance of the LLM, use the following command:
78+
7479
``` bash
7580
python benchmark.py -m <model> -d <device> -r <report_csv> -f <framework> -p <prompt text> -n <num_iters>
7681
# e.g.
77-
python benchmark.py -m models/llama-2-7b-chat/pytorch/dldt/FP32 -n 2
78-
python benchmark.py -m models/llama-2-7b-chat/pytorch/dldt/FP32 -p "What is openvino?" -n 2
79-
python benchmark.py -m models/llama-2-7b-chat/pytorch/dldt/FP32 -pf prompts/llama-2-7b-chat_l.jsonl -n 2
82+
python benchmark.py -m models/llama-2-7b-chat/ -n 2
83+
python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2
84+
python benchmark.py -m models/llama-2-7b-chat/ -pf prompts/llama-2-7b-chat_l.jsonl -n 2
8085
```
81-
Parameters:
82-
* `-m` - model path
83-
* `-d` - inference device (default=cpu)
84-
* `-r` - report csv
85-
* `-f` - framework (default=ov)
86-
* `-p` - interactive prompt text
87-
* `-pf` - path of JSONL file including interactive prompts
88-
* `-n` - number of benchmarking iterations, if the value greater 0, will exclude the first iteration. (default=0)
89-
* `-ic` - limit the output token size (default 512) of text_gen and code_gen models.
90-
9186

87+
**Parameters:**
88+
- `-m`: Path to the model.
89+
- `-d`: Inference device (default: CPU).
90+
- `-r`: Path to the CSV report.
91+
- `-f`: Framework (default: ov).
92+
- `-p`: Interactive prompt text.
93+
- `-pf`: Path to a JSONL file containing prompts.
94+
- `-n`: Number of iterations (default: 0, the first iteration is excluded).
95+
- `-ic`: Limit the output token size (default: 512) for text generation and code generation models.
96+
97+
**Additional options:**
9298
``` bash
9399
python ./benchmark.py -h # for more information
94100
```
95101

96-
## Running `torch.compile()`
102+
#### Benchmarking the Original PyTorch Model:
103+
To benchmark the original PyTorch model, first download the model locally and then run benchmark by specifying PyTorch as the framework with parameter `-f pt`
97104

98-
The option `--torch_compile_backend` uses `torch.compile()` to speed up
99-
the PyTorch code by compiling it into optimized kernels using a selected backend.
105+
```bash
106+
# Download PyTorch Model
107+
huggingface-cli download meta-llama/Llama-2-7b-chat-hf --local-dir models/llama-2-7b-chat/pytorch
108+
# Benchmark with PyTorch Framework
109+
python benchmark.py -m models/llama-2-7b-chat/pytorch -n 2 -f pt
110+
```
100111

101-
Prerequisites: install benchmarking dependencies using requirements.txt
112+
> **Note:** If needed, You can install a specific OpenVINO version using pip:
113+
> ``` bash
114+
> # e.g.
115+
> pip install openvino==2024.4.0
116+
> # Optional, install the openvino nightly package if needed.
117+
> # OpenVINO nightly is pre-release software and has not undergone full release validation or qualification.
118+
> pip uninstall openvino
119+
> pip install --upgrade --pre openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
120+
> ```
102121
103-
``` bash
104-
pip install -r requirements.txt
105-
```
122+
## 4. Benchmark LLM with `torch.compile()`
123+
124+
The `--torch_compile_backend` option enables you to use `torch.compile()` to accelerate PyTorch models by compiling them into optimized kernels using a specified backend.
106125
107-
In order to run the `torch.compile()` on CUDA GPU, install additionally the nightly PyTorch version:
126+
Before benchmarking, you need to download the original PyTorch model. Use the following command to download the model locally:
108127
109128
```bash
110-
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
129+
huggingface-cli download meta-llama/Llama-2-7b-chat-hf --local-dir models/llama-2-7b-chat/pytorch
111130
```
112131
113-
Add the option `--torch_compile_backend` with the desired backend: `pytorch` or `openvino` (default) while running the benchmarking script:
132+
To run the benchmarking script with `torch.compile()`, use the `--torch_compile_backend` option to specify the backend. You can choose between `pytorch` or `openvino` (default). Example:
114133

115134
```bash
116135
python ./benchmark.py -m models/llama-2-7b-chat/pytorch -d CPU --torch_compile_backend openvino
117136
```
118137

119-
## Run on 2 sockets platform
138+
> **Note:** To use `torch.compile()` with CUDA GPUs, you need to install the nightly version of PyTorch:
139+
>
140+
> ```bash
141+
> pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
142+
> ```
143+
120144
121-
benchmark.py sets openvino.properties.streams.num(1) by default
145+
## 5. Running on 2-Socket Platforms
122146
123-
| OpenVINO version | Behaviors |
147+
The benchmarking script sets `openvino.properties.streams.num(1)` by default. For multi-socket platforms, use `numactl` on Linux or the `--load_config` option to modify behavior.
148+
149+
| OpenVINO Version | Behaviors |
124150
|:--------------------|:------------------------------------------------|
125-
| Before 2024.0.0 | streams.num(1) <br>execute on 2 sockets. |
126-
| 2024.0.0 | streams.num(1) <br>execute on the same socket as the APP is running on. |
151+
| Before 2024.0.0 | streams.num(1) <br>execute on 2 sockets. |
152+
| 2024.0.0 | streams.num(1) <br>execute on the same socket as the APP is running on. |
127153
128-
numactl on Linux or --load_config for benchmark.py can be used to change the behaviors.
154+
For example, `--load_config config.json` as following will result in streams.num(1) and execute on 2 sockets.
155+
```json
156+
{
157+
"INFERENCE_NUM_THREADS": <NUMBER>
158+
}
159+
```
160+
`<NUMBER>` is the number of total physical cores in 2 sockets.
129161
130-
For example, --load_config config.json as following in OpenVINO 2024.0.0 will result in streams.num(1) and execute on 2 sockets.
131-
```
132-
{"INFERENCE_NUM_THREADS":<NUMBER>}
133-
```
134-
`<NUMBER>` is the number of total physical cores in 2 sockets
162+
## 6. Additional Resources
135163

136-
## Additional Resources
137-
### 1. NOTE
138-
> If you encounter any errors, please check **[NOTES.md](./doc/NOTES.md)** which provides solutions to the known errors.
139-
### 2. Image generation
140-
> To configure more parameters for image generation models, reference to **[IMAGE_GEN.md](./doc/IMAGE_GEN.md)**
164+
- **Error Troubleshooting:** Check the [NOTES.md](./doc/NOTES.md) for solutions to known issues.
165+
- **Image Generation Configuration:** Refer to [IMAGE_GEN.md](./doc/IMAGE_GEN.md) for setting parameters for image generation models.

0 commit comments

Comments
 (0)