Skip to content

Commit 1693821

Browse files
[DOCS] nncf changes pass 3 recommend (openvinotoolkit#26873)
1 parent 005152a commit 1693821

File tree

2 files changed

+32
-11
lines changed

2 files changed

+32
-11
lines changed

docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Weight Compression
2-
==================
1+
LLM Weight Compression
2+
=========================
33

44
.. toctree::
55
:maxdepth: 1
@@ -187,7 +187,7 @@ trade-offs after optimization:
187187
ratio=0.9,
188188
)
189189
190-
* ``scale_estimation`` - boolean parameter that enables more accurate estimation of
190+
* ``scale_estimation`` - boolean parameter that enables more accurate estimation of
191191
quantization scales. Especially helpful when the weights of all layers are quantized to
192192
4 bits. Requires dataset.
193193

docs/articles_en/openvino-workflow/model-optimization.rst

+29-8
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ It is a `set of compression algorithms <https://github.com/openvinotoolkit/nncf/
2222
organized as a Python package, that make your models smaller and faster. Note that NNCF
2323
is **not part of the OpenVINO package**, so it needs to be installed separately. It supports
2424
models in **PyTorch**, **TensorFlow** , **ONNX**, and **OpenVINO IR** formats, offering
25-
the following optimizations:
25+
the following main optimizations:
2626

2727
.. image:: ../assets/images/WHAT_TO_USE.svg
2828

@@ -42,20 +42,41 @@ the following optimizations:
4242
as Quantization-aware Training. This kind of optimization requires the use of the model's
4343
original framework, for NNCF, it is either PyTorch or TensorFlow.
4444
45-
A common approach is to perform post-training quantization first, as it is the easiest option.
46-
If the result proves unsatisfactory, quantization-aware training will give you higher accuracy
47-
with the same level of performance boost. For the most performant product, adding filter pruning
48-
will further streamline the model.
4945

50-
To learn about the full scope of the framework, its installation, and technical details, visit
51-
both `the NNCF repository <https://github.com/openvinotoolkit/nncf?tab=readme-ov-file>`__ and
52-
`NNCF API documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/>`__.
46+
47+
Recommended workflows
48+
##########################
49+
50+
* A common approach for most cases is to:
51+
52+
1. Perform post-training quantization first, as it is the easiest option.
53+
2. For even better results, combine post-training quantization with filter pruning.
54+
3. If the accuracy drop is unacceptable, use quantization-aware training instead. It will give
55+
you the same level of performance boost, with a smaller impact on accuracy.
56+
57+
* **Weight compression** works **only with LLMs**. Do not try to use it with other models.
58+
* For **visual-multimodal** use cases, the encoder / decoder split approach may be recommended.
59+
60+
61+
62+
63+
5364

5465

5566

5667
.. image:: ../assets/images/DEVELOPMENT_FLOW_V3_crunch.svg
5768

5869

70+
71+
Installation and usage
72+
###########################
73+
74+
To learn about the full scope of the framework, its installation, and technical details, visit
75+
both `the NNCF repository <https://github.com/openvinotoolkit/nncf?tab=readme-ov-file>`__ and
76+
`NNCF API documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/>`__.
77+
78+
79+
5980
.. tab-set::
6081

6182
.. tab-item:: Installation

0 commit comments

Comments
 (0)