update paragraph

echarlaix · echarlaix · commit be866d49e75c · 2024-03-13T12:37:39.000+01:00
diff --git a/docs/source/inference.mdx b/docs/source/inference.mdx
@@ -99,7 +99,7 @@ tokenizer.save_pretrained(save_directory)
 
 ### Weight-only quantization
 
-You can also apply fp16, 8-bit or 4-bit weight compression on the linear and embedding layers when exporting your model with the CLI by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:
+You can also apply fp16, 8-bit or 4-bit weight compression on the Linear, Convolutional and Embedding layers when exporting your model with the CLI by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:
 
 ```bash
 optimum-cli export openvino --model gpt2 --weight-format int8 ov_model
diff --git a/docs/source/optimization_ov.mdx b/docs/source/optimization_ov.mdx
@@ -25,7 +25,7 @@ Quantization is a technique to reduce the computational and memory costs of runn
 
 ### Weight-only quantization
 
-Quantization can be applied on the model's linear and embedding layers, enabling the loading of large models on memory-limited devices. For example, when applying 8-bit quantization, the resulting model will be x4 smaller than its fp32 counterpart. For 4-bit quantization, the reduction in memory could theoretically reach x8, but is closer to x6 in practice.
+Quantization can be applied on the model's Linear, Convolutional and Embedding layers, enabling the loading of large models on memory-limited devices. For example, when applying 8-bit quantization, the resulting model will be x4 smaller than its fp32 counterpart. For 4-bit quantization, the reduction in memory could theoretically reach x8, but is closer to x6 in practice.
 
 
 #### 8-bit