Skip to content

Commit c121e4e

Browse files
authored
Doc: Update fp8 accuracy test data and update docker image 1.20.0 (#2130)
Signed-off-by: fengding <feng1.ding@intel.com>
1 parent b6a4c78 commit c121e4e

File tree

2 files changed

+52
-28
lines changed

2 files changed

+52
-28
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,9 @@ pip install neural-compressor[tf]
5454
After successfully installing these packages, try your first quantization program. **Following example code demonstrates FP8 Quantization**, it is supported by Intel Gaudi2 AI Accelerator.
5555
To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
5656

57-
Run a container with an interactive shell,
57+
Run a container with an interactive shell, [more info](https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Docker_Installation.html#docker-installation)
5858
```
59-
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu24.04/habanalabs/pytorch-installer-2.5.1:latest
59+
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.20.0/ubuntu24.04/habanalabs/pytorch-installer-2.6.0:latest
6060
```
6161
Run the example,
6262
```python

docs/source/3x/PT_FP8Quant.md

+50-26
Original file line numberDiff line numberDiff line change
@@ -129,47 +129,71 @@ mistralai/Mistral-Nemo-Instruct-2407
129129

130130
### Running with FP8
131131
Refer to [here](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation#running-with-fp8).
132-
Change "--model_name_or_path" to be your model like
133-
"meta-llama/Llama-3.1-8B-Instruct",
134-
"Qwen/Qwen2.5-7B-Instruct", or
135-
"mistralai/Mixtral-8x7B-Instruct-v0.1" and so on.
136-
"--use_kv_cache" is to enable FP8 KV cache.
132+
Change "--model_name_or_path" to be your model like someone in the above models list. "--use_kv_cache" is or not to enable FP8 KV cache.
137133

138134
### Profiling
139-
Add "--profiling_warmup_steps 5 --profiling_steps 2 --profiling_record_shapes" as args in the end of commandline of run_generation.py.
135+
Add "--profiling_warmup_steps 5 --profiling_steps 2 --profiling_record_shapes" as args in the end of commandline of `run_generation.py`.
140136
Refer to [torch.profiler.ProfilerActivity.HPU](https://github.com/huggingface/optimum-habana/blob/c9e1c23620618e2f260c92c46dfeb163545ec5ba/optimum/habana/utils.py#L305).
141137

142138
### FP8 Accuracy
143139
"lm_eval.tasks", "lm_eval.evaluator", "lm_eval" are installed from the above requirements_lm_eval.txt. The tasks can be set and the default is ["hellaswag", "lambada_openai", "piqa", "winogrande"], [more info](https://github.com/EleutherAI/lm-evaluation-harness/)
144140

145-
| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ bf16 KVCache|
141+
| `Llama-3.1-8B-Instruct`| fp8 w/ fp8 KVCache| bf16 w/ bf16 KVCache|
146142
|---------------|---------|--------|
147-
| hellaswag | 0.5691097390957977 | 0.5704043019318861 |
148-
| lambada_openai| 0.7360760721909567 | 0.7372404424607025 |
149-
| piqa | 0.7850924918389554 | 0.7818280739934712 |
150-
| winogrande | 0.6929755327545383 | 0.6929755327545383 |
143+
| lambada_openai| 0.7299 | 0.7359 |
144+
| hellaswag | 0.5892 | 0.5911 |
145+
| piqa | 0.7965 | 0.7998 |
146+
| winogrande | 0.7474 | 0.7372 |
147+
| mmlu | 0.6599 | 0.6829 |
151148

152-
| `Qwen2.5-7B-Instruct`| fp8 & fp8 KVCache| bf16 w/ bf16 KVCache|
149+
| `Phi-3-mini-4k-instruct`| fp8 w/ fp8 KVCache| bf16 w/ bf16 KVCache|
153150
|---------------|---------|--------|
154-
| hellaswag | 0.2539334793865764 | 0.2539334793865764 |
155-
| lambada_openai| 0.0 | 0.0 |
156-
| piqa | 0.5391730141458106 | 0.5391730141458106 |
157-
| winogrande | 0.4956590370955012 | 0.4956590370955012 |
151+
| lambada_openai| 0.6420 | 0.6552 |
152+
| hellaswag | 0.5866 | 0.5902 |
153+
| piqa | 0.8041 | 0.8014 |
154+
| winogrande | 0.7324 | 0.7348 |
155+
| mmlu | 0.7035 | 0.7055 |
158156

159-
| `Llama-3.1-8B-Instruct`| fp8 & fp8 KVCache| bf16 w/ bf16 KVCache|
157+
| `Mistral-7B-Instruct-v0.2`| fp8 w/ fp8 KVCache| bf16 w/ bf16 KVCache|
160158
|---------------|---------|--------|
161-
| hellaswag | 0.5934076877116112 | 0.5975901214897431 |
162-
| lambada_openai| 0.7230739375121289 | 0.7255967397632447 |
163-
| piqa | 0.7932535364526659 | 0.8030467899891186 |
164-
| winogrande | 0.7434885556432518 | 0.7371744277821626 |
159+
| lambada_openai| 0.7126 | 0.7165 |
160+
| hellaswag | 0.6556 | 0.6609 |
161+
| piqa | 0.8014 | 0.8025 |
162+
| winogrande | 0.7253 | 0.7388 |
163+
| mmlu | 0.5833 | 0.5919 |
165164

165+
| `Mistral-Nemo-Instruct-2407`| fp8 w/ fp8 KVCache| bf16 w/ bf16 KVCache|
166+
|---------------|---------|--------|
167+
| lambada_openai| 0.7568 | 0.7596 |
168+
| hellaswag | 0.6273 | 0.6325 |
169+
| piqa | 0.8150 | 0.8085 |
170+
| winogrande | 0.7419 | 0.7482 |
171+
| mmlu | 0.6684 | 0.6840 |
172+
173+
| `bigscience/bloom-7b1`| fp8 w/ fp8 KVCache| bf16 w/ bf16 KVCache|
174+
|---------------|---------|--------|
175+
| lambada_openai| 0.5599 | 0.5731 |
176+
| hellaswag | 0.4632 | 0.4639 |
177+
| piqa | 0.7301 | 0.7242 |
178+
| winogrande | 0.6314 | 0.6393 |
179+
| mmlu | 0.2563 | 0.2572 |
180+
181+
| `Mixtral-8x7B-Instruct-v0.1`| fp8 w/ fp8 KVCache| bf16 w/ bf16 KVCache|
182+
|---------------|---------|--------|
183+
| lambada_openai| 0.7805 | 0.7778 |
184+
| hellaswag | 0.6733 | 0.6764 |
185+
| piqa | 0.8324 | 0.8351 |
186+
| winogrande | 0.7680 | 0.7672 |
187+
| mmlu | 0.7031 | 0.7026 |
166188

167-
| `Mixtral-8x7B-Instruct-v0.1`| fp8 & fp8 KVCache| bf16 w/ bf16 KVCache|
189+
| `EleutherAI/gpt-j-6b`| fp8 w/ fp8 KVCache| bf16 w/ bf16 KVCache|
168190
|---------------|---------|--------|
169-
| hellaswag | 0.25323640709022105 | 0.25323640709022105 |
170-
| lambada_openai| 0.0 | 0.0 |
171-
| piqa | 0.528835690968444 | 0.528835690968444 |
172-
| winogrande | 0.4956590370955012 | 0.4956590370955012 |
191+
| lambada_openai| 0.6769 | 0.6781 |
192+
| hellaswag | 0.4928 | 0.4958 |
193+
| piqa | 0.7557 | 0.7541 |
194+
| winogrande | 0.6409 | 0.6425 |
195+
| mmlu | 0.2524 | 0.2606 |
196+
> Notes: For gpt-j model, if `--use_kv_cache` is set to enable KVCache quantization, `--reuse_cache` should also be set.
173197
174198
## VLLM example
175199
### Overview

0 commit comments

Comments
 (0)