Skip to content

Commit b99a79d

Browse files
violetch24xin3he
andauthored
modify 3.x ipex example structure (#1858)
* modify 3.x ipex example structure Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add json path Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix for sq Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * remove old files Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix act_algo Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Co-authored-by: xinhe <xin3.he@intel.com>
1 parent 922b247 commit b99a79d

File tree

20 files changed

+1162
-16
lines changed

20 files changed

+1162
-16
lines changed

docs/3x/PT_SmoothQuant.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ run_fn(prepared_model)
4646
q_model = convert(prepared_model)
4747
```
4848

49-
To get more information, please refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/llm).
49+
To get more information, please refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/smooth_quant).
5050

5151

5252
## Validated Models

docs/3x/PT_StaticQuant.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ q_model = convert(prepared_model)
6868

6969
#### Model Examples
7070

71-
Users could refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/llm) on how to quantize a new model.
71+
Users could refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/static_quant/ipex) on how to quantize a new model.
7272

7373

7474
### Static Quantization with PT2E Backend
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
{
2+
"pytorch": {
3+
"gpt_j_ipex":{
4+
"model_src_dir": "nlp/huggingface_models/language-modeling/quantization/static_quant",
5+
"dataset_location": "",
6+
"input_model": "",
7+
"main_script": "run_clm_no_trainer.py",
8+
"batch_size": 1
9+
},
10+
"gpt_j_ipex_sq":{
11+
"model_src_dir": "nlp/huggingface_models/language-modeling/quantization/smooth_quant",
12+
"dataset_location": "",
13+
"input_model": "",
14+
"main_script": "run_clm_no_trainer.py",
15+
"batch_size": 1
16+
},
17+
"llama2_7b_ipex":{
18+
"model_src_dir": "nlp/huggingface_models/language-modeling/quantization/static_quant",
19+
"dataset_location": "",
20+
"input_model": "",
21+
"main_script": "run_clm_no_trainer.py",
22+
"batch_size": 1
23+
},
24+
"llama2_7b_ipex_sq":{
25+
"model_src_dir": "nlp/huggingface_models/language-modeling/quantization/smooth_quant",
26+
"dataset_location": "",
27+
"input_model": "",
28+
"main_script": "run_clm_no_trainer.py",
29+
"batch_size": 1
30+
},
31+
"opt_125m_ipex":{
32+
"model_src_dir": "nlp/huggingface_models/language-modeling/quantization/static_quant",
33+
"dataset_location": "",
34+
"input_model": "",
35+
"main_script": "run_clm_no_trainer.py",
36+
"batch_size": 8
37+
},
38+
"opt_125m_ipex_sq":{
39+
"model_src_dir": "nlp/huggingface_models/language-modeling/quantization/smooth_quant",
40+
"dataset_location": "",
41+
"input_model": "",
42+
"main_script": "run_clm_no_trainer.py",
43+
"batch_size": 8
44+
}
45+
}
46+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
Step-by-Step
2+
============
3+
This document describes the step-by-step instructions to run large language models (LLMs) using Smooth Quantization on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with PyTorch and Intel® Extension for PyTorch.
4+
5+
The script `run_clm_no_trainer.py` supports `GPTJ`, `OPT`, `LLaMA2`, `BLOOM` and `Falcon` quantization and validates last word prediction accuracy with [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness.git) now, and we are adding more models.
6+
7+
# Prerequisite
8+
## 1. Create Environment
9+
```
10+
# Installation
11+
pip install -r requirements.txt
12+
```
13+
14+
# Run
15+
16+
Here is how to run the scripts:
17+
18+
**Causal Language Modeling (CLM)**
19+
20+
`run_clm_no_trainer.py` quantizes the large language models using the dataset [NeelNanda/pile-10k](https://huggingface.co/datasets/NeelNanda/pile-10k) calibration and validates `lambada_openai`, `piqa`, `winogrande`, `hellaswag` and other datasets accuracy provided by lm_eval, an example command is as follows.
21+
### GPT-J-6b
22+
23+
#### Quantization
24+
```bash
25+
# "--sq" is used to enable smooth quant
26+
python run_clm_no_trainer.py \
27+
--model EleutherAI/gpt-j-6B \
28+
--quantize \
29+
--sq \
30+
--alpha 1.0 \
31+
--ipex \
32+
--output_dir "saved_results"
33+
```
34+
**Notes**: Smooth quantization here is based on torch.jit. Without past key value in example_inputs, the quantized model cannot be used for text-generation.
35+
36+
### OPT-125m
37+
38+
#### Quantization
39+
40+
```bash
41+
# "--sq" is used to enable smooth quant
42+
python run_clm_no_trainer.py \
43+
--model facebook/opt-125m \
44+
--quantize \
45+
--sq \
46+
--alpha 0.5 \
47+
--ipex \
48+
--output_dir "saved_results"
49+
```
50+
51+
### LLAMA2-7b/13b/70b
52+
>Note: LLAMA requires IPEX requirements >= 2.1 to get better accuracy.
53+
#### Quantization
54+
55+
```bash
56+
# "--sq" is used to enable smooth quant
57+
python run_clm_no_trainer.py \
58+
--model meta-llama/Llama-2-7b-hf \
59+
--quantize \
60+
--sq \
61+
--alpha 0.8 \
62+
--ipex \
63+
--output_dir "saved_results"
64+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
accelerate
2+
protobuf
3+
sentencepiece != 0.1.92
4+
datasets >= 1.1.3
5+
torch >= 1.10
6+
transformers
7+
pytest
8+
wandb
9+
einops
10+
neural-compressor
11+
intel-extension-for-transformers
12+
lm_eval==0.4.2
13+
peft
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
#!/bin/bash
2+
set -x
3+
4+
function main {
5+
6+
init_params "$@"
7+
run_benchmark
8+
9+
}
10+
11+
# init params
12+
function init_params {
13+
iters=100
14+
batch_size=16
15+
approach=static
16+
tuned_checkpoint=saved_results
17+
task=lambada_openai
18+
echo ${max_eval_samples}
19+
for var in "$@"
20+
do
21+
case $var in
22+
--topology=*)
23+
topology=$(echo $var |cut -f2 -d=)
24+
;;
25+
--dataset_location=*)
26+
dataset_location=$(echo $var |cut -f2 -d=)
27+
;;
28+
--input_model=*)
29+
input_model=$(echo $var |cut -f2 -d=)
30+
;;
31+
--mode=*)
32+
mode=$(echo $var |cut -f2 -d=)
33+
;;
34+
--batch_size=*)
35+
batch_size=$(echo $var |cut -f2 -d=)
36+
;;
37+
--iters=*)
38+
iters=$(echo ${var} |cut -f2 -d=)
39+
;;
40+
--int8=*)
41+
int8=$(echo ${var} |cut -f2 -d=)
42+
;;
43+
--config=*)
44+
tuned_checkpoint=$(echo $var |cut -f2 -d=)
45+
;;
46+
*)
47+
echo "Error: No such parameter: ${var}"
48+
exit 1
49+
;;
50+
esac
51+
done
52+
53+
}
54+
55+
56+
# run_benchmark
57+
function run_benchmark {
58+
extra_cmd=''
59+
60+
if [[ ${mode} == "accuracy" ]]; then
61+
mode_cmd=" --accuracy "
62+
extra_cmd=$extra_cmd" --load"
63+
elif [[ ${mode} == "performance" ]]; then
64+
mode_cmd=" --performance --iters "${iters}
65+
extra_cmd=$extra_cmd" --load"
66+
else
67+
echo "Error: No such mode: ${mode}"
68+
exit 1
69+
fi
70+
71+
if [[ ${int8} == "true" ]]; then
72+
extra_cmd=$extra_cmd" --int8"
73+
fi
74+
echo $extra_cmd
75+
76+
if [ "${topology}" = "opt_125m_ipex_sq" ]; then
77+
model_name_or_path="facebook/opt-125m"
78+
extra_cmd=$extra_cmd" --ipex --sq --alpha 0.5"
79+
elif [ "${topology}" = "llama2_7b_ipex_sq" ]; then
80+
model_name_or_path="meta-llama/Llama-2-7b-hf"
81+
extra_cmd=$extra_cmd" --ipex --sq --alpha 0.8"
82+
elif [ "${topology}" = "gpt_j_ipex_sq" ]; then
83+
model_name_or_path="EleutherAI/gpt-j-6b"
84+
extra_cmd=$extra_cmd" --ipex --sq --alpha 1.0"
85+
fi
86+
87+
python -u run_clm_no_trainer.py \
88+
--model ${model_name_or_path} \
89+
--approach ${approach} \
90+
--output_dir ${tuned_checkpoint} \
91+
--task ${task} \
92+
--batch_size ${batch_size} \
93+
${extra_cmd} ${mode_cmd}
94+
}
95+
96+
main "$@"

0 commit comments

Comments
 (0)