Introduce OVQuantizationConfig for nncf.quantize() parameters #638

nikita-savelyevv · 2024-03-27T15:27:19Z

What does this PR do?

Existing OVWeightQuantizationConfig is now intended only for weight compression and the new OVQuantizationConfig is used for full quantization.
Parameters for inner nncf.quantize() call should from now on be given by providing an instance of OVQuantizationConfig to OVQuantizer.quantize() via OVConfig.
The config objects are serializable, but the saved quantization config is intended only for information purposes. It can't be loaded and used for quantization again. This is because it may contain some custom objects, like instances of nncf.Dataset or transformers.PreTrainedTokenizer, which can't be serialized so they are omitted during serialization.
The previously used DEFAULT_QUANTIZATION_CONFIG is moved to trainer.py where its only usage currently is.
Added a test for quantization config serialization. Updated all other quantization tests to be aligned with the new format.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-03-27T15:32:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/intel/openvino/trainer.py

nikita-savelyevv · 2024-04-03T13:53:39Z

optimum/intel/openvino/configuration.py

-    def _enable_standard_onnx_export_option(self):
-        # This method depends on self.save_onnx_model.
-        # save_onnx_model is defaulted to false so that the final model output is
-        # in OpenVINO IR to realize performance benefit in OpenVINO runtime.
-        # True value of save_onnx_model will save a model in onnx format.
-        if (
-            isinstance(self.compression, dict)
-            and "algorithm" in self.compression
-            and self.compression["algorithm"] == "quantization"
-        ):
-            self.compression["export_to_onnx_standard_ops"] = self.save_onnx_model
-        elif isinstance(self.compression, list):
-            for i, algo_config in enumerate(self.compression):
-                if algo_config["algorithm"] == "quantization":
-                    self.compression[i]["export_to_onnx_standard_ops"] = self.save_onnx_model


Moved this logic directly to trainer.py

nikita-savelyevv · 2024-04-03T13:54:50Z

optimum/intel/openvino/configuration.py

@@ -179,22 +217,24 @@ class OVWeightQuantizationConfig(QuantizationConfigMixin):
                    using the [`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
        dataset (`str or List[str]`, *optional*):
            The dataset used for data-aware compression or quantization with NNCF. You can provide your own dataset
-            in a list of strings or just use the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs
+            in a list of strings or just use the one from the list ['wikitext','c4','c4-new','ptb','ptb-new'] for LLLMs


There is no dataset with id "wikitext2"

optimum/intel/openvino/configuration.py

nikita-savelyevv · 2024-04-03T14:02:49Z

optimum/intel/openvino/quantization.py

+        advanced_parameters=nncf.AdvancedQuantizationParameters(
+            smooth_quant_alphas=AdvancedSmoothQuantParameters(matmul=-1)
+        ),


There was a small bug here

nikita-savelyevv · 2024-04-03T14:04:11Z

tests/openvino/test_quantization.py

+            # Verify that the configuration is correctly saved and loaded
+            loaded_config = OVConfig.from_pretrained(tmp_dir)
+            self.assertEqual(ov_config.quantization_config.to_dict(), loaded_config.quantization_config)
+


Brought back after #630

nikita-savelyevv · 2024-04-04T09:18:00Z

@echarlaix @AlexKoff88 PR is ready for review

AlexKoff88

I mostly agree with the changes. @echarlaix, please take a look and merge if it is ok.

echarlaix

looks great, thanks @nikita-savelyevv

optimum/intel/openvino/configuration.py

optimum/intel/openvino/quantization.py

optimum/intel/openvino/configuration.py

nikita-savelyevv · 2024-04-11T12:14:22Z

The failed test is not caused by my changes

nikita-savelyevv · 2024-04-12T11:03:57Z

@echarlaix please review after addressed changes

echarlaix

Looks great, thanks @nikita-savelyevv

echarlaix · 2024-04-12T12:55:07Z

optimum/intel/openvino/configuration.py

+        if weight_only is True:
+            logger.warning(
+                "Trying to create an instance of `OVQuantizationConfig` with `weight_only` being True. "
+                "Please check your configuration."
+            )


why is this needed ?

Suggested change

if weight_only is True:

logger.warning(

"Trying to create an instance of `OVQuantizationConfig` with `weight_only` being True. "

"Please check your configuration."

)

It's just to make sure that nobody creates the config with the wrong weight_only property, e.g. OVWeightQuantizationConfig(weight_only=False) by accident. This is not strictly required. I'll remove this part.

I remembered the reason this is needed. In the scenario when someone provides quantization config as dictionary instead of a Config object, there is some logic that tries to infer the type of the config based on the provided keys. However, this is not 100% reliable, so there's an additional flag to distinguish between weight-only and full quantization.

For example, if config is given as dict(ignored_scope={"names": ["op_name"]}) than it's ambiguous which kind of quantization is intended because both kinds accept an ignored scope parameter. By default in such case OVWeightQuantizationConfig will be created and a warning will be given.

To run full quantization with only ignored scope given, it's required to provide config as dict(weight_only=False, ignored_scope={"names": ["op_name"]}). In such case an instance of OVQuantizationConfig will be built and full quantization is run.

The warning you mention here happens when for example the config is given as dict(bits=8, sym=True, weight_only=False) which is confusing since parameters for weight only quantization are given, but weight_only hints at that the full quantization is intended. In such case, weight only quantization will be executed nevertheless, but the warning is given to avoid such confusion.

I see, I think it would make sense to always create an instance of OVWeightQuantizationConfig by default and remove this parameter

optimum/intel/openvino/modeling_base.py

echarlaix · 2024-04-12T13:09:41Z

optimum/intel/openvino/modeling_decoder.py

@@ -572,7 +573,8 @@ def _from_pretrained(
        from_onnx: bool = False,
        local_files_only: bool = False,
        load_in_8bit: bool = False,
-        quantization_config: Union[OVWeightQuantizationConfig, Dict] = None,
+        quantization_config: Optional[Union[OVWeightQuantizationConfig, Dict]] = None,
+        calibration_dataset: Optional[nncf.Dataset] = None,


would prefer to not include a calibration_dataset argument and only support a subset of calibration dataset via the quantization_config when loading a model with the from_pretrained method (and leave the possibility to give any calibration_dataset when applying quantization with the OVQuantizer), what do you think ?

Hmm. There's currently a scenario where custom dataset is provided to .from_pretrained()
method via config: https://github.com/huggingface/optimum-intel/blob/main/tests/openvino/test_quantization.py#L453

Since we decided that .dataset property of the config will now contain only string typed values, it'll look kind of hacky to keep it this way.

How about I remove explicit definition of calibration_dataset argument from .from_pretrained signature, but extract it from **kwargs there? This's still shady, but IMO it's better than passing it through the .dataset property of the config. What do you think?

I think for custom dataset we should provide support through the OVQuantizer and not when loading the model with from_pretrained, will open a following PR to add this

Recently, there were some [changes ](huggingface/optimum-intel#638 to how NNCF quantization is called via optimum-intel. This PR updates the quantization use cases to how they are currently intended to be called.

Introduce OVQuantizationConfig for nncf.quantize() parameters

22d2d6a

AlexKoff88 reviewed Mar 28, 2024

View reviewed changes

optimum/intel/openvino/trainer.py Show resolved Hide resolved

nikita-savelyevv added 7 commits March 28, 2024 11:39

Ignored scope tweaks

350bfa9

Merge branch 'main' into introduce-ov-quantization-config

6861a69

Added **kwargs to quantization call. Added config serialization test.

b90fd42

Ignored scope changes. Tests pass.

bfe982c

Added documentation

e7d0d14

Linters

431775e

Merge branch 'main' into introduce-ov-quantization-config

ff2df77

nikita-savelyevv commented Apr 3, 2024

View reviewed changes

nikita-savelyevv added 3 commits April 3, 2024 16:08

Linters

c8edf99

Tweak ignored scope serialization

77faf7f

Added deprecation errors, tweak docs

123e227

nikita-savelyevv requested review from AlexKoff88 and echarlaix April 4, 2024 09:17

nikita-savelyevv marked this pull request as ready for review April 4, 2024 09:17

AlexKoff88 approved these changes Apr 4, 2024

View reviewed changes

echarlaix reviewed Apr 9, 2024

View reviewed changes

echarlaix reviewed Apr 10, 2024

View reviewed changes

optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved

Addressed minor comments

20fd761

nikita-savelyevv force-pushed the introduce-ov-quantization-config branch 3 times, most recently from 99d87f9 to f7fa3a1 Compare April 11, 2024 11:14

Make quantization config contain only serializable properties.

f7fa3a1

nikita-savelyevv requested a review from echarlaix April 11, 2024 12:14

Small tweaks

0e79c09

echarlaix approved these changes Apr 12, 2024

View reviewed changes

nikita-savelyevv added 3 commits April 12, 2024 17:07

Address comments

13b2350

Fix ruff

f314ba0

Fix ruff 2

70ee0ef

echarlaix merged commit ff5d185 into huggingface:main Apr 15, 2024
9 of 10 checks passed

l-bat mentioned this pull request Apr 17, 2024

Fix dataset in stable_diffusion_hybrid_quantization notebook #667

Merged

3 tasks

echarlaix mentioned this pull request Apr 17, 2024

Remove check as nncf now a required dependency #664

Closed

This was referenced Apr 17, 2024

Align optimum quantization calls with the latest API openvinotoolkit/openvino_notebooks#1935

Merged

Update OV quantization docs and QA notebook according to the recent changes #671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce OVQuantizationConfig for nncf.quantize() parameters #638

Introduce OVQuantizationConfig for nncf.quantize() parameters #638

nikita-savelyevv commented Mar 27, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 27, 2024

nikita-savelyevv Apr 3, 2024

nikita-savelyevv Apr 3, 2024

nikita-savelyevv Apr 3, 2024

nikita-savelyevv Apr 3, 2024

nikita-savelyevv commented Apr 4, 2024

AlexKoff88 left a comment

echarlaix left a comment

nikita-savelyevv commented Apr 11, 2024

nikita-savelyevv commented Apr 12, 2024

echarlaix left a comment

echarlaix Apr 12, 2024

nikita-savelyevv Apr 12, 2024 •

edited

Loading

nikita-savelyevv Apr 12, 2024

echarlaix Apr 15, 2024

echarlaix Apr 12, 2024

nikita-savelyevv Apr 12, 2024

echarlaix Apr 15, 2024

Introduce OVQuantizationConfig for nncf.quantize() parameters #638

Introduce OVQuantizationConfig for nncf.quantize() parameters #638

Conversation

nikita-savelyevv commented Mar 27, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Mar 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikita-savelyevv commented Apr 4, 2024

AlexKoff88 left a comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

nikita-savelyevv commented Apr 11, 2024

nikita-savelyevv commented Apr 12, 2024

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikita-savelyevv Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikita-savelyevv commented Mar 27, 2024 •

edited

Loading

nikita-savelyevv Apr 12, 2024 •

edited

Loading