You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+10-3
Original file line number
Diff line number
Diff line change
@@ -78,12 +78,18 @@ It is possible to export your model to the [OpenVINO IR](https://docs.openvino.a
78
78
optimum-cli export openvino --model gpt2 ov_model
79
79
```
80
80
81
-
You can also apply 8-bit weight-only quantization when exporting your model : the model linearand embedding weights will be quantized to INT8, the activations will be kept in floating point precision.
81
+
You can also apply 8-bit weight-only quantization when exporting your model : the model linear, embedding and convolution weights will be quantized to INT8, the activations will be kept in floating point precision.
Quantization in hybrid mode can be applied to Stable Diffusion pipeline during model export. This involves applying hybrid post-training quantization to the UNet model and weight-only quantization for the rest of the pipeline components. In the hybrid mode, weights in MatMul and Embedding layers are quantized, as well as activations of other layers.
To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov).
"**NOTE:** if you notice very low accuracy after post-training quantization, it is likely caused by an overflow issue which affects processors that do not contain VNNI (Vector Neural Network Instruction). NNCF has an `overflow_fix` option to address this. It will effectively use 7-bits for quantizing instead of 8-bits to prevent the overflow. To use this option, modify the code in the next cell to add an explicit quantization configuration, and set `overflow_fix` to `\"enable\"`:\n",
"For more information, see [Lower Numerical Precision Deep Learning Inference and Training](https://www.intel.com/content/www/us/en/developer/articles/technical/lower-numerical-precision-deep-learning-inference-and-training.html)"
"Library name is not specified. There are multiple possible variants: `sentence_transformers`, `transformers`."
221
+
"`transformers` will be selected. If you want to load your model with the `sentence-transformers` library instead, please set --library sentence_transformers"
222
+
)
223
+
library_name="transformers"
224
+
225
+
if (
226
+
library_name=="diffusers"
227
+
andov_config
228
+
andov_config.quantization_config
229
+
andov_config.quantization_config.datasetisnotNone
230
+
):
231
+
ifnotis_diffusers_available():
232
+
raiseValueError(DIFFUSERS_IMPORT_ERROR.format("Export of diffusers models"))
Model ID on huggingface.co or path on disk to the model repository to export.
79
78
output (`Union[str, Path]`):
80
-
Path indicating the directory where to store the generated ONNX model.
79
+
Path indicating the directory where to store the generated OpenVINO model.
81
80
82
81
> Optional parameters
83
82
@@ -187,12 +186,6 @@ def main_export(
187
186
f"The task could not be automatically inferred as this is available only for models hosted on the Hugging Face Hub. Please provide the argument --task with the relevant task from {', '.join(TasksManager.get_all_tasks())}. Detailed error: {e}"
0 commit comments