Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_in_4bit option for OVModelForCausalLM #538

Merged
merged 31 commits into from
Feb 8, 2024
Merged

Conversation

AlexKoff88
Copy link
Collaborator

@AlexKoff88 AlexKoff88 commented Jan 30, 2024

  • API extension:
import nncf
from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

MODEL_ID = "databricks/dolly-v2-3b" 
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

model = OVModelForCausalLM.from_pretrained(
    MODEL_ID,
    export=True,
    load_in_4bit=True,
    quantization_config={"dataset": "ptb", "mode": nncf.CompressWeightsMode.INT4_ASYM, "ratio": 0.8},
) 
  • A set of default configs for 4-bit weights-only quantization of popular models
  • Aligned tests with the latest version of NNCF (+Conv 8-bit quantization)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@AlexKoff88 AlexKoff88 changed the title [WIP]: load_in_4bit option for OVModelForCausalLM load_in_4bit option for OVModelForCausalLM Jan 31, 2024
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really great @AlexKoff88 !

Comment on lines 124 to 125
quantization_config: QuantizationConfigMixin = None,
ov_config: OVConfig = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need both an ov_config and quantization_config, would prefer to keep only one if that's feasible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix, how about deprecating ov_config here and migrating to different types of QuantizationConfigMixin so that there will be only quantization_config parameter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was actually thinking about the opposite as ov_config have a quantization section, need to think a bit about this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix, I've changed the code according to your suggestion.

Comment on lines 124 to 125
quantization_config: QuantizationConfigMixin = None,
ov_config: OVConfig = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was actually thinking about the opposite as ov_config have a quantization section, need to think a bit about this

@AlexKoff88
Copy link
Collaborator Author

CI is ready. The two failed quantization tests will be fixed by #535.

@@ -83,6 +84,7 @@ def __init__(
compression: Union[List[Dict], Dict, None] = None,
input_info: Optional[List] = None,
save_onnx_model: bool = False,
weight_quantization_config: Optional[QuantizationConfigMixin] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix, fyi, if I name it quantization_config I get an error during the serialization process due to the logic in transformers library that checks the availability of quantization_config field and fails if it is None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could having only one argument make sense instead of having both compression and weight_quantization_config (we could create an instance of WeightQuantizationConfig if needed in the init directly). Also could you add a test to verify serialization is working ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for iterating, it looks great

@@ -45,7 +45,7 @@
"transformers>=4.34.0",
],
"openvino": ["openvino>=2023.2", "onnx", "onnxruntime", "transformers>=4.36.0", "optimum>=1.16.1"],
"nncf": ["nncf>=2.7.0"],
"nncf": ["nncf @ git+https://github.com/openvinotoolkit/nncf.git"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for when is the next nncf release planned ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release should be in 2-3 weeks. I would like to migrate to develop version for now to have time for integration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK would prefer to wait for nncf next release but 2/3 weeks sounds reasonable

@@ -83,6 +84,7 @@ def __init__(
compression: Union[List[Dict], Dict, None] = None,
input_info: Optional[List] = None,
save_onnx_model: bool = False,
weight_quantization_config: Optional[QuantizationConfigMixin] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could having only one argument make sense instead of having both compression and weight_quantization_config (we could create an instance of WeightQuantizationConfig if needed in the init directly). Also could you add a test to verify serialization is working ?

@AlexKoff88
Copy link
Collaborator Author

@echarlaix, PR is ready and all the comments are resolved.

@echarlaix echarlaix merged commit a7b766e into main Feb 8, 2024
10 of 12 checks passed
@echarlaix echarlaix deleted the ak/load_in_4bit_alt branch February 8, 2024 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants