You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/README.md
+32-1
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,18 @@ Here is how to run the scripts:
21
21
### GPT-J-6b
22
22
23
23
#### Quantization
24
+
```bash
25
+
# "--sq" is used to enable smooth quant
26
+
python run_clm_no_trainer.py \
27
+
--model EleutherAI/gpt-j-6B \
28
+
--quantize \
29
+
--sq \
30
+
--alpha 1.0 \
31
+
--ipex \
32
+
--output_dir "saved_results"
33
+
```
34
+
**Notes**: Smooth quantization here is based on torch.jit. Without past key value in example_inputs, the quantized model cannot be used for text-generation.
35
+
24
36
```bash
25
37
# "--approach weight_only" is used to enable weight only quantization.
26
38
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
@@ -62,6 +74,15 @@ python run_clm_no_trainer.py \
62
74
#### Quantization
63
75
64
76
```bash
77
+
# "--sq" is used to enable smooth quant
78
+
python run_clm_no_trainer.py \
79
+
--model facebook/opt-125m \
80
+
--quantize \
81
+
--sq \
82
+
--alpha 0.5 \
83
+
--ipex \
84
+
--output_dir "saved_results"
85
+
65
86
# "--approach weight_only" is used to enable weight only quantization.
66
87
# "--woq_algo GPTQ" is used to enable GPTQ algorithms
67
88
# "--double_quant_type BNB_NF4" is used to enable double quant algorithms
0 commit comments