You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"The weight format of the exporting model, e.g. f32 stands for float32 weights, f16 - for float16 weights, i8 - INT8 weights, int4_* - for INT4 compressed weights."
"Compression ratio between primary and backup precision. In the case of INT4, NNCF evaluates layer sensitivity and keeps the most impactful layers in INT8"
92
92
"precision (by default 20%% in INT8). This helps to achieve better accuracy after weight compression."
93
93
),
94
94
)
95
+
optional_group.add_argument(
96
+
"--sym",
97
+
type=bool,
98
+
default=None,
99
+
help=("Whether to apply symmetric quantization"),
100
+
)
101
+
102
+
optional_group.add_argument(
103
+
"--group-size",
104
+
type=int,
105
+
default=None,
106
+
help=("The group size to use for quantization. Recommended value is 128 and -1 uses per-column quantization."),
0 commit comments