Skip to content

Commit f4ec215

Browse files
committed
Updated docs with load_in_4bit
1 parent 0ece48b commit f4ec215

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

docs/source/optimization_ov.mdx

+11-10
Original file line numberDiff line numberDiff line change
@@ -74,21 +74,22 @@ model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
7474

7575
> **NOTE:** `load_in_8bit` is enabled by default for models larger than 1 billion parameters.
7676

77-
For the 4-bit weight quantization we recommend using the NNCF API like below:
77+
For the 4-bit weight quantization you can use `load_in_4bit` option. The `quantization_config` can be used to controll the optimization parameters, for example:
78+
7879
```python
79-
from optimum.intel import OVModelForCausalLM
80+
from optimum.intel import OVModelForCausalLM, OVWeightQuantizationConfig
8081
import nncf
8182

82-
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=False)
83-
model.model = nncf.compress_weights(
84-
model.model,
85-
mode=nncf.CompressWeightsMode.INT4_SYM,
86-
ratio=0.8,
87-
group_size=128,
88-
)
89-
model.save_pretrained("compressed_model")
83+
model = OVModelForCausalLM.from_pretrained(
84+
model_id,
85+
export=True,
86+
load_in_4bit=True,
87+
quantization_config=OVWeightQuantizationConfig(mode=nncf.CompressWeightsMode.INT4_ASYM, ratio=0.8, dataset="ptb"),
88+
)
9089
```
9190

91+
> **NOTE:** if `load_in_4bit` is used without `quantization_config` provided, a pre-defined `model_id` specific configuration is used in case it exists or a default 4-bit configuration is used otherwise.
92+
9293
For more details, please refer to the corresponding NNCF [documentation](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/CompressWeights.md).
9394

9495

0 commit comments

Comments
 (0)