Add doc

l-bat · l-bat · commit 3544c4bd8236 · 2024-03-08T22:31:09.000Z
diff --git a/docs/source/optimization_ov.mdx b/docs/source/optimization_ov.mdx
@@ -69,6 +69,23 @@ from optimum.intel import OVModelForCausalLM
 model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
 ```
 
+##  Hybrid quantization
+
+Traditional optimization methods like post-training 8-bit quantization do not work for Stable Diffusion models because accuracy drops significantly. On the other hand, weight compression does not improve performance when applied to Stable Diffusion models, as the size of activations is comparable to weights.
+The UNet model takes up most of the overall execution time of the pipeline. Thus, optimizing just one model brings substantial benefits in terms of inference speed while keeping acceptable accuracy without fine-tuning. Quantizing the rest of the diffusion pipeline does not significantly improve inference performance but could potentially lead to substantial degradation of accuracy.
+Therefore, the proposal is to apply quantization in hybrid mode for the UNet model and weight-only quantization for other pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.
+For optimizing the Stable Diffusion pipeline, utilize the `quantization_config` to define optimization parameters. To enable hybrid quantization, specify the quantization dataset in the `quantization_config`; otherwise, weight-only quantization in specified precisions will be applied to UNet.
+
+```python
+from optimum.intel import OVStableDiffusionPipeline, OVWeightQuantizationConfig
+
+model = OVStableDiffusionPipeline.from_pretrained(
+    model_id,
+    export=True,
+    quantization_config=OVWeightQuantizationConfig(bits=8, dataset="conceptual_captions"),
+)
+```
+
 <Tip warning={true}>
 
 `load_in_8bit` is enabled by default for the models larger than 1 billion parameters.