Skip to content

Commit 87f9aa3

Browse files
committed
update documentation
1 parent 3b1f28e commit 87f9aa3

File tree

2 files changed

+12
-9
lines changed

2 files changed

+12
-9
lines changed

docs/source/inference.mdx

+11-7
Original file line numberDiff line numberDiff line change
@@ -116,17 +116,21 @@ model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
116116

117117
There are also alternative compression options for a different performance-accuracy trade-off:
118118

119-
| Option | Description |
120-
|---------------------------------------------------------------------|-------------------|
121-
| `fp16` | Float16 weights |
122-
| `int8` | INT8 weights |
123-
| `int4_sym_g128`, `int4_asym_g128`, `int4_sym_g64`, `int4_asym_g64`* | INT4 weights |
119+
| Option | Description |
120+
|---------|-------------------|
121+
| `fp16` | Float16 weights |
122+
| `int8` | INT8 weights |
123+
| `int4` | INT4 weights |
124124

125-
*`sym` and `asym` stand for symmetric and asymmetric quantization, `g128` and `g64` means the group size `128` and `64` respectively.
125+
The `--sym` parameter stands for symmetric quantization, asymmetric quantization will be applied by default if not provided.
126126

127-
`--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantized layers and can also change performance-accuracy trade-off for the optimized model. It is valid only for INT4 quantization options.
127+
For INT4 quantization :
128+
* The `--group-size` parameter will define the group size to use for quantization, the rcommended value is 128 and -1 will results in per-column quantization.
129+
* The `--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantized layers and can also change performance-accuracy trade-off for the optimized model. It is valid only for INT4 quantization options.
128130

129131

132+
controls the ratio between 4-bit and 8-bit quantized layers and can also change performance-accuracy trade-off for the optimized model. It is valid only for INT4 quantization options.
133+
130134
To apply quantization on both weights and activations, you can use the `OVQuantizer`, more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#optimization).
131135

132136
### Static shape

optimum/commands/export/openvino.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -94,11 +94,10 @@ def parse_args_openvino(parser: "ArgumentParser"):
9494
)
9595
optional_group.add_argument(
9696
"--sym",
97-
type=bool,
97+
action="store_true",
9898
default=None,
9999
help=("Whether to apply symmetric quantization"),
100100
)
101-
102101
optional_group.add_argument(
103102
"--group-size",
104103
type=int,

0 commit comments

Comments
 (0)