Update docs/source/inference.mdx

echarlaix · AlexKoff88 · web-flow · commit 9cefecf8c2ce · 2024-02-26T11:01:56.000+01:00
Co-authored-by: Alexander Kozlov &lt;alexander.kozlov@intel.com&gt;
diff --git a/docs/source/inference.mdx b/docs/source/inference.mdx
@@ -114,7 +114,7 @@ For INT4 quantization you can also specify the following arguments :
 
 Smaller `group_size` and `ratio` of usually improve accuracy at the sacrifice of the model size and inference latency.
 
-You can also apply apply 8-bit quantization on your model's weight when loading your model by setting the `load_in_8bit=True` argument when calling the `from_pretrained()` method.
+You can also apply 8-bit quantization on your model's weight when loading your model by setting the `load_in_8bit=True` argument when calling the `from_pretrained()` method.
 
 ```python
 from optimum.intel import OVModelForCausalLM