update documentation

echarlaix · echarlaix · commit 87f9aa317eda · 2024-02-17T00:25:33.000+01:00
diff --git a/docs/source/inference.mdx b/docs/source/inference.mdx
@@ -116,17 +116,21 @@ model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
 
 There are also alternative compression options for a different performance-accuracy trade-off:
 
-| Option                                                              | Description       |
-|---------------------------------------------------------------------|-------------------|
-| `fp16`                                                              | Float16 weights   |
-| `int8`                                                              | INT8 weights      |
-| `int4_sym_g128`, `int4_asym_g128`, `int4_sym_g64`, `int4_asym_g64`* | INT4 weights      |
+| Option  | Description       |
+|---------|-------------------|
+| `fp16`  | Float16 weights   |
+| `int8`  | INT8 weights      |
+| `int4`  | INT4 weights      |
 
-*`sym` and `asym` stand for symmetric and asymmetric quantization, `g128` and `g64` means the group size `128` and `64` respectively. 
+The `--sym` parameter stands for symmetric quantization, asymmetric quantization will be applied by default if not provided.
 
-`--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantized layers and can also change performance-accuracy trade-off for the optimized model. It is valid only for INT4 quantization options.
+For INT4 quantization :
+* The `--group-size` parameter will define the group size to use for quantization, the rcommended value is 128 and -1 will results in per-column quantization.
+* The `--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantized layers and can also change performance-accuracy trade-off for the optimized model. It is valid only for INT4 quantization options.
 
 
+controls the ratio between 4-bit and 8-bit quantized layers and can also change performance-accuracy trade-off for the optimized model. It is valid only for INT4 quantization options.
+
 To apply quantization on both weights and activations, you can use the `OVQuantizer`, more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#optimization).
 
 ### Static shape
diff --git a/optimum/commands/export/openvino.py b/optimum/commands/export/openvino.py
@@ -94,11 +94,10 @@ def parse_args_openvino(parser: "ArgumentParser"):
     )
     optional_group.add_argument(
         "--sym",
-        type=bool,
+        action="store_true",
         default=None,
         help=("Whether to apply symmetric quantization"),
     )
-
     optional_group.add_argument(
         "--group-size",
         type=int,

Original file line number	Diff line number	Diff line change
`@@ -94,11 +94,10 @@ def parse_args_openvino(parser: "ArgumentParser"):`
`94`	`94`	`)`
`95`	`95`	`optional_group.add_argument(`
`96`	`96`	`"--sym",`
`97`		`- type=bool,`
	`97`	`+ action="store_true",`
`98`	`98`	`default=None,`
`99`	`99`	`help=("Whether to apply symmetric quantization"),`
`100`	`100`	`)`
`101`		`-`
`102`	`101`	`optional_group.add_argument(`
`103`	`102`	`"--group-size",`
`104`	`103`	`type=int,`