Skip to content

Commit c7e228f

Browse files
Updated docs with load_in_4bit (#558)
* Updated docs with load_in_4bit * Update documentation * Update documentation * typo --------- Co-authored-by: Ella Charlaix <ella@huggingface.co>
1 parent 18ba0bd commit c7e228f

File tree

1 file changed

+9
-12
lines changed

1 file changed

+9
-12
lines changed

docs/source/optimization_ov.mdx

+9-12
Original file line numberDiff line numberDiff line change
@@ -74,19 +74,16 @@ model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
7474

7575
> **NOTE:** `load_in_8bit` is enabled by default for models larger than 1 billion parameters.
7676

77-
For the 4-bit weight quantization we recommend using the NNCF API like below:
77+
For the 4-bit weight quantization you can use the `quantization_config` to specify the optimization parameters, for example:
78+
7879
```python
79-
from optimum.intel import OVModelForCausalLM
80-
import nncf
81-
82-
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=False)
83-
model.model = nncf.compress_weights(
84-
model.model,
85-
mode=nncf.CompressWeightsMode.INT4_SYM,
86-
ratio=0.8,
87-
group_size=128,
88-
)
89-
model.save_pretrained("compressed_model")
80+
from optimum.intel import OVModelForCausalLM, OVWeightQuantizationConfig
81+
82+
model = OVModelForCausalLM.from_pretrained(
83+
model_id,
84+
export=True,
85+
quantization_config=OVWeightQuantizationConfig(bits=4, sym=False, ratio=0.8, dataset="ptb"),
86+
)
9087
```
9188

9289
For more details, please refer to the corresponding NNCF [documentation](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/CompressWeights.md).

0 commit comments

Comments
 (0)