Skip to content

Commit 5e319aa

Browse files
authoredMar 4, 2024
Fix documentation (#583)
* Fix documentation * fix
1 parent 77365f4 commit 5e319aa

File tree

2 files changed

+10
-3
lines changed

2 files changed

+10
-3
lines changed
 

‎docs/source/inference.mdx

+5-2
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ By default the quantization scheme will be [assymmetric](https://github.com/open
110110

111111
For INT4 quantization you can also specify the following arguments :
112112
* The `--group-size` parameter will define the group size to use for quantization, `-1` it will results in per-column quantization.
113-
* The `--ratio` CLI parameter controls the ratio between 4-bit and 8-bit quantization. If set to 0.9, it means that 90% of the layers will be quantized to `int4` while 10% will be quantized to `int8`.
113+
* The `--ratio` parameter controls the ratio between 4-bit and 8-bit quantization. If set to 0.9, it means that 90% of the layers will be quantized to `int4` while 10% will be quantized to `int8`.
114114

115115
Smaller `group_size` and `ratio` of usually improve accuracy at the sacrifice of the model size and inference latency.
116116

@@ -122,8 +122,11 @@ from optimum.intel import OVModelForCausalLM
122122
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
123123
```
124124

125-
> **NOTE:** `load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
125+
<Tip warning={true}>
126126

127+
`load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
128+
129+
</Tip>
127130

128131
To apply quantization on both weights and activations, you can use the `OVQuantizer`, more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#optimization).
129132

‎docs/source/optimization_ov.mdx

+5-1
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,11 @@ from optimum.intel import OVModelForCausalLM
6969
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
7070
```
7171

72-
> **NOTE:** `load_in_8bit` is enabled by default for models larger than 1 billion parameters.
72+
<Tip warning={true}>
73+
74+
`load_in_8bit` is enabled by default for the models larger than 1 billion parameters.
75+
76+
</Tip>
7377

7478
For the 4-bit weight quantization you can use the `quantization_config` to specify the optimization parameters, for example:
7579

0 commit comments

Comments
 (0)