Fix documentation (#583)

echarlaix · web-flow · commit 5e319aae58f1 · 2024-03-04T12:02:27.000+01:00
* Fix documentation

* fix
diff --git a/docs/source/inference.mdx b/docs/source/inference.mdx
@@ -110,7 +110,7 @@ By default the quantization scheme will be [assymmetric](https://github.com/open
 
 For INT4 quantization you can also specify the following arguments :
 * The `--group-size` parameter will define the group size to use for quantization, `-1` it will results in per-column quantization.
-* The `--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantization. If set to 0.9, it means that 90% of the layers will be quantized to `int4` while 10% will be quantized to `int8`.
+* The `--ratio` parameter controls the ratio between 4-bit and 8-bit quantization. If set to 0.9, it means that 90% of the layers will be quantized to `int4` while 10% will be quantized to `int8`.
 
 Smaller `group_size` and `ratio` of usually improve accuracy at the sacrifice of the model size and inference latency.
 
@@ -122,8 +122,11 @@ from optimum.intel import OVModelForCausalLM
 model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
 ```
 
-> **NOTE:** `load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
+<Tip warning={true}>
 
+`load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
+
+</Tip>
 
 To apply quantization on both weights and activations, you can use the `OVQuantizer`, more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#optimization).
 
diff --git a/docs/source/optimization_ov.mdx b/docs/source/optimization_ov.mdx
@@ -69,7 +69,11 @@ from optimum.intel import OVModelForCausalLM
 model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
 ```
 
-> **NOTE:** `load_in_8bit` is enabled by default for models larger than 1 billion parameters.
+<Tip warning={true}>
+
+`load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
+
+</Tip>
 
 For the 4-bit weight quantization you can use the `quantization_config` to specify the optimization parameters, for example: