-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce OVQuantizationConfig for nncf.quantize() parameters #638
Introduce OVQuantizationConfig for nncf.quantize() parameters #638
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
def _enable_standard_onnx_export_option(self): | ||
# This method depends on self.save_onnx_model. | ||
# save_onnx_model is defaulted to false so that the final model output is | ||
# in OpenVINO IR to realize performance benefit in OpenVINO runtime. | ||
# True value of save_onnx_model will save a model in onnx format. | ||
if ( | ||
isinstance(self.compression, dict) | ||
and "algorithm" in self.compression | ||
and self.compression["algorithm"] == "quantization" | ||
): | ||
self.compression["export_to_onnx_standard_ops"] = self.save_onnx_model | ||
elif isinstance(self.compression, list): | ||
for i, algo_config in enumerate(self.compression): | ||
if algo_config["algorithm"] == "quantization": | ||
self.compression[i]["export_to_onnx_standard_ops"] = self.save_onnx_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this logic directly to trainer.py
@@ -179,22 +217,24 @@ class OVWeightQuantizationConfig(QuantizationConfigMixin): | |||
using the [`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`. | |||
dataset (`str or List[str]`, *optional*): | |||
The dataset used for data-aware compression or quantization with NNCF. You can provide your own dataset | |||
in a list of strings or just use the one from the list ['wikitext2','c4','c4-new','ptb','ptb-new'] for LLLMs | |||
in a list of strings or just use the one from the list ['wikitext','c4','c4-new','ptb','ptb-new'] for LLLMs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no dataset with id "wikitext2"
advanced_parameters=nncf.AdvancedQuantizationParameters( | ||
smooth_quant_alphas=AdvancedSmoothQuantParameters(matmul=-1) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a small bug here
tests/openvino/test_quantization.py
Outdated
# Verify that the configuration is correctly saved and loaded | ||
loaded_config = OVConfig.from_pretrained(tmp_dir) | ||
self.assertEqual(ov_config.quantization_config.to_dict(), loaded_config.quantization_config) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brought back after #630
@echarlaix @AlexKoff88 PR is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly agree with the changes. @echarlaix, please take a look and merge if it is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, thanks @nikita-savelyevv
99d87f9
to
f7fa3a1
Compare
The failed test is not caused by my changes |
@echarlaix please review after addressed changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @nikita-savelyevv
if weight_only is True: | ||
logger.warning( | ||
"Trying to create an instance of `OVQuantizationConfig` with `weight_only` being True. " | ||
"Please check your configuration." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed ?
if weight_only is True: | |
logger.warning( | |
"Trying to create an instance of `OVQuantizationConfig` with `weight_only` being True. " | |
"Please check your configuration." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just to make sure that nobody creates the config with the wrong weight_only
property, e.g. OVWeightQuantizationConfig(weight_only=False)
by accident. This is not strictly required. I'll remove this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered the reason this is needed. In the scenario when someone provides quantization config as dictionary instead of a Config object, there is some logic that tries to infer the type of the config based on the provided keys. However, this is not 100% reliable, so there's an additional flag to distinguish between weight-only and full quantization.
For example, if config is given as dict(ignored_scope={"names": ["op_name"]})
than it's ambiguous which kind of quantization is intended because both kinds accept an ignored scope parameter. By default in such case OVWeightQuantizationConfig
will be created and a warning will be given.
To run full quantization with only ignored scope given, it's required to provide config as dict(weight_only=False, ignored_scope={"names": ["op_name"]})
. In such case an instance of OVQuantizationConfig
will be built and full quantization is run.
The warning you mention here happens when for example the config is given as dict(bits=8, sym=True, weight_only=False)
which is confusing since parameters for weight only quantization are given, but weight_only
hints at that the full quantization is intended. In such case, weight only quantization will be executed nevertheless, but the warning is given to avoid such confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think it would make sense to always create an instance of OVWeightQuantizationConfig
by default and remove this parameter
@@ -572,7 +573,8 @@ def _from_pretrained( | |||
from_onnx: bool = False, | |||
local_files_only: bool = False, | |||
load_in_8bit: bool = False, | |||
quantization_config: Union[OVWeightQuantizationConfig, Dict] = None, | |||
quantization_config: Optional[Union[OVWeightQuantizationConfig, Dict]] = None, | |||
calibration_dataset: Optional[nncf.Dataset] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer to not include a calibration_dataset
argument and only support a subset of calibration dataset via the quantization_config
when loading a model with the from_pretrained
method (and leave the possibility to give any calibration_dataset
when applying quantization with the OVQuantizer), what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. There's currently a scenario where custom dataset is provided to .from_pretrained()
method via config: https://github.com/huggingface/optimum-intel/blob/main/tests/openvino/test_quantization.py#L453
Since we decided that .dataset
property of the config will now contain only string typed values, it'll look kind of hacky to keep it this way.
How about I remove explicit definition of calibration_dataset
argument from .from_pretrained
signature, but extract it from **kwargs
there? This's still shady, but IMO it's better than passing it through the .dataset
property of the config. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for custom dataset we should provide support through the OVQuantizer
and not when loading the model with from_pretrained
, will open a following PR to add this
Recently, there were some [changes ](huggingface/optimum-intel#638 to how NNCF quantization is called via optimum-intel. This PR updates the quantization use cases to how they are currently intended to be called.
What does this PR do?
OVWeightQuantizationConfig
is now intended only for weight compression and the newOVQuantizationConfig
is used for full quantization.nncf.quantize()
call should from now on be given by providing an instance ofOVQuantizationConfig
toOVQuantizer.quantize()
viaOVConfig
.nncf.Dataset
ortransformers.PreTrainedTokenizer
, which can't be serialized so they are omitted during serialization.DEFAULT_QUANTIZATION_CONFIG
is moved totrainer.py
where its only usage currently is.Before submitting