Skip to content

Commit 3544c4b

Browse files
committed
Add doc
1 parent 24de966 commit 3544c4b

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

docs/source/optimization_ov.mdx

+17
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,23 @@ from optimum.intel import OVModelForCausalLM
6969
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
7070
```
7171

72+
## Hybrid quantization
73+
74+
Traditional optimization methods like post-training 8-bit quantization do not work for Stable Diffusion models because accuracy drops significantly. On the other hand, weight compression does not improve performance when applied to Stable Diffusion models, as the size of activations is comparable to weights.
75+
The UNet model takes up most of the overall execution time of the pipeline. Thus, optimizing just one model brings substantial benefits in terms of inference speed while keeping acceptable accuracy without fine-tuning. Quantizing the rest of the diffusion pipeline does not significantly improve inference performance but could potentially lead to substantial degradation of accuracy.
76+
Therefore, the proposal is to apply quantization in hybrid mode for the UNet model and weight-only quantization for other pipeline components. The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.
77+
For optimizing the Stable Diffusion pipeline, utilize the `quantization_config` to define optimization parameters. To enable hybrid quantization, specify the quantization dataset in the `quantization_config`; otherwise, weight-only quantization in specified precisions will be applied to UNet.
78+
79+
```python
80+
from optimum.intel import OVStableDiffusionPipeline, OVWeightQuantizationConfig
81+
82+
model = OVStableDiffusionPipeline.from_pretrained(
83+
model_id,
84+
export=True,
85+
quantization_config=OVWeightQuantizationConfig(bits=8, dataset="conceptual_captions"),
86+
)
87+
```
88+
7289
<Tip warning={true}>
7390

7491
`load_in_8bit` is enabled by default for the models larger than 1 billion parameters.

0 commit comments

Comments
 (0)