-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_in_4bit option for OVModelForCausalLM #538
Conversation
AlexKoff88
commented
Jan 30, 2024
•
edited
Loading
edited
- API extension:
- A set of default configs for 4-bit weights-only quantization of popular models
- Aligned tests with the latest version of NNCF (+Conv 8-bit quantization)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks really great @AlexKoff88 !
quantization_config: QuantizationConfigMixin = None, | ||
ov_config: OVConfig = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need both an ov_config
and quantization_config
, would prefer to keep only one if that's feasible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echarlaix, how about deprecating ov_config
here and migrating to different types of QuantizationConfigMixin
so that there will be only quantization_config
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was actually thinking about the opposite as ov_config
have a quantization section, need to think a bit about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echarlaix, I've changed the code according to your suggestion.
quantization_config: QuantizationConfigMixin = None, | ||
ov_config: OVConfig = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was actually thinking about the opposite as ov_config
have a quantization section, need to think a bit about this
CI is ready. The two failed quantization tests will be fixed by #535. |
@@ -83,6 +84,7 @@ def __init__( | |||
compression: Union[List[Dict], Dict, None] = None, | |||
input_info: Optional[List] = None, | |||
save_onnx_model: bool = False, | |||
weight_quantization_config: Optional[QuantizationConfigMixin] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echarlaix, fyi, if I name it quantization_config
I get an error during the serialization process due to the logic in transformers library that checks the availability of quantization_config
field and fails if it is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could having only one argument make sense instead of having both compression
and weight_quantization_config
(we could create an instance of WeightQuantizationConfig
if needed in the init directly). Also could you add a test to verify serialization is working ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating, it looks great
@@ -45,7 +45,7 @@ | |||
"transformers>=4.34.0", | |||
], | |||
"openvino": ["openvino>=2023.2", "onnx", "onnxruntime", "transformers>=4.36.0", "optimum>=1.16.1"], | |||
"nncf": ["nncf>=2.7.0"], | |||
"nncf": ["nncf @ git+https://github.com/openvinotoolkit/nncf.git"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for when is the next nncf release planned ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The release should be in 2-3 weeks. I would like to migrate to develop version for now to have time for integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK would prefer to wait for nncf next release but 2/3 weeks sounds reasonable
@@ -83,6 +84,7 @@ def __init__( | |||
compression: Union[List[Dict], Dict, None] = None, | |||
input_info: Optional[List] = None, | |||
save_onnx_model: bool = False, | |||
weight_quantization_config: Optional[QuantizationConfigMixin] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could having only one argument make sense instead of having both compression
and weight_quantization_config
(we could create an instance of WeightQuantizationConfig
if needed in the init directly). Also could you add a test to verify serialization is working ?
@echarlaix, PR is ready and all the comments are resolved. |