Skip to content

Commit 52875b9

Browse files
authored
Clarify 8bit WOQ model loading in documentation (#745)
* Clarify load_in_8bit default value in documentation * add warning
1 parent a45a3c4 commit 52875b9

File tree

2 files changed

+8
-2
lines changed

2 files changed

+8
-2
lines changed

docs/source/inference.mdx

+7-1
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,12 @@ optimum-cli export openvino --model local_path --task text-generation-with-past
5454

5555
To export your model in fp16, you can add `--weight-format fp16` when exporting your model.
5656

57+
<Tip warning={true}>
58+
59+
Models larger than 1 billion parameters are exported to the OpenVINO format with 8-bit weights by default. You can disable it with `--weight-format fp32`.
60+
61+
</Tip>
62+
5763
Once the model is exported, you can load the OpenVINO model using :
5864

5965
```python
@@ -130,7 +136,7 @@ model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
130136

131137
<Tip warning={true}>
132138

133-
`load_in_8bit` is enabled by default for the models larger than 1 billion parameters. You can disable it with `load_in_8bit=False`.
139+
If not specified, `load_in_8bit` will be set to `True` by default when models larger than 1 billion parameters are exported to the OpenVINO format (with `export=True`). You can disable it with `load_in_8bit=False`.
134140

135141
</Tip>
136142

docs/source/optimization_ov.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ model.save_pretrained(saving_directory)
4444

4545
<Tip warning={true}>
4646

47-
`load_in_8bit` is enabled by default for the models larger than 1 billion parameters. You can disable it with `load_in_8bit=False`.
47+
If not specified, `load_in_8bit` will be set to `True` by default when models larger than 1 billion parameters are exported to the OpenVINO format (with `export=True`). You can disable it with `load_in_8bit=False`.
4848

4949
</Tip>
5050

0 commit comments

Comments
 (0)