You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*`sym` and `asym` stand for symmetric and asymmetric quantization, `g128` and `g64` means the group size `128` and `64` respectively.
126
-
127
-
`--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantized layers and can also change performance-accuracy trade-off for the optimized model. It is valid only for INT4 quantization options.
128
-
129
127
130
128
To apply quantization on both weights and activations, you can use the `OVQuantizer`, more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#optimization).
"The weight format of the exporting model, e.g. f32 stands for float32 weights, f16 - for float16 weights, i8 - INT8 weights, int4_* - for INT4 compressed weights."
"Compression ratio between primary and backup precision. In the case of INT4, NNCF evaluates layer sensitivity and keeps the most impactful layers in INT8"
92
92
"precision (by default 20%% in INT8). This helps to achieve better accuracy after weight compression."
93
93
),
94
94
)
95
+
optional_group.add_argument(
96
+
"--sym",
97
+
action="store_true",
98
+
default=None,
99
+
help=("Whether to apply symmetric quantization"),
100
+
)
101
+
optional_group.add_argument(
102
+
"--group-size",
103
+
type=int,
104
+
default=None,
105
+
help=("The group size to use for quantization. Recommended value is 128 and -1 uses per-column quantization."),
0 commit comments