load_in_4bit option for OVModelForCausalLM #538

AlexKoff88 · 2024-01-30T11:07:09Z

API extension:

import nncf
from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

MODEL_ID = "databricks/dolly-v2-3b" 
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

model = OVModelForCausalLM.from_pretrained(
    MODEL_ID,
    export=True,
    load_in_4bit=True,
    quantization_config={"dataset": "ptb", "mode": nncf.CompressWeightsMode.INT4_ASYM, "ratio": 0.8},
)

A set of default configs for 4-bit weights-only quantization of popular models
Aligned tests with the latest version of NNCF (+Conv 8-bit quantization)

HuggingFaceDocBuilderDev · 2024-01-30T11:12:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix

looks really great @AlexKoff88 !

optimum/intel/__init__.py

optimum/intel/openvino/modeling_base.py

optimum/intel/openvino/modeling_decoder.py

echarlaix · 2024-02-01T17:20:42Z

optimum/intel/openvino/quantization.py

+        quantization_config: QuantizationConfigMixin = None,
+        ov_config: OVConfig = None,


Not sure we need both an ov_config and quantization_config, would prefer to keep only one if that's feasible

@echarlaix, how about deprecating ov_config here and migrating to different types of QuantizationConfigMixin so that there will be only quantization_config parameter?

was actually thinking about the opposite as ov_config have a quantization section, need to think a bit about this

@echarlaix, I've changed the code according to your suggestion.

optimum/intel/openvino/modeling_base.py

optimum/intel/openvino/modeling_decoder.py

echarlaix · 2024-02-05T11:50:05Z

optimum/intel/openvino/quantization.py

+        quantization_config: QuantizationConfigMixin = None,
+        ov_config: OVConfig = None,


was actually thinking about the opposite as ov_config have a quantization section, need to think a bit about this

AlexKoff88 · 2024-02-05T14:33:43Z

CI is ready. The two failed quantization tests will be fixed by #535.

AlexKoff88 · 2024-02-07T06:05:41Z

optimum/intel/openvino/configuration.py

@@ -83,6 +84,7 @@ def __init__(
        compression: Union[List[Dict], Dict, None] = None,
        input_info: Optional[List] = None,
        save_onnx_model: bool = False,
+        weight_quantization_config: Optional[QuantizationConfigMixin] = None,


@echarlaix, fyi, if I name it quantization_config I get an error during the serialization process due to the logic in transformers library that checks the availability of quantization_config field and fails if it is None.

Could having only one argument make sense instead of having both compression and weight_quantization_config (we could create an instance of WeightQuantizationConfig if needed in the init directly). Also could you add a test to verify serialization is working ?

echarlaix

thanks for iterating, it looks great

optimum/intel/openvino/modeling_base.py

optimum/intel/openvino/quantization.py

echarlaix · 2024-02-07T17:27:07Z

setup.py

@@ -45,7 +45,7 @@
        "transformers>=4.34.0",
    ],
    "openvino": ["openvino>=2023.2", "onnx", "onnxruntime", "transformers>=4.36.0", "optimum>=1.16.1"],
-    "nncf": ["nncf>=2.7.0"],
+    "nncf": ["nncf @ git+https://github.com/openvinotoolkit/nncf.git"],


for when is the next nncf release planned ?

The release should be in 2-3 weeks. I would like to migrate to develop version for now to have time for integration.

OK would prefer to wait for nncf next release but 2/3 weeks sounds reasonable

echarlaix · 2024-02-07T17:37:01Z

optimum/intel/openvino/configuration.py

@@ -83,6 +84,7 @@ def __init__(
        compression: Union[List[Dict], Dict, None] = None,
        input_info: Optional[List] = None,
        save_onnx_model: bool = False,
+        weight_quantization_config: Optional[QuantizationConfigMixin] = None,


Could having only one argument make sense instead of having both compression and weight_quantization_config (we could create an instance of WeightQuantizationConfig if needed in the init directly). Also could you add a test to verify serialization is working ?

AlexKoff88 · 2024-02-08T10:36:23Z

@echarlaix, PR is ready and all the comments are resolved.

AlexKoff88 added 5 commits January 29, 2024 16:13

Initial code for load_in_4_bit

7051462

Dataset does not work

491f25a

Intermediate changes

a08a16a

Make it working with dataset

3ceea1d

Style

68d4f2d

AlexKoff88 added 6 commits January 30, 2024 16:32

Fixed small issue

8b403da

Fixed failed tests

0410b42

Style

7edffc8

Comment failed tests due to NNCF 2.8

829cc6d

Commented failed tests until new NNCF release

1e87775

Added tests for load_in_4bit

efe85a2

AlexKoff88 changed the title ~~[WIP]: load_in_4bit option for OVModelForCausalLM~~ load_in_4bit option for OVModelForCausalLM Jan 31, 2024

AlexKoff88 added 4 commits February 1, 2024 12:48

Added awq option. Included NNCF package into openvino extra.

6768527

Rolled back including nncf into openvino extra

54f8fe0

Style

2ec2a54

Fixed tests

c2f373f

echarlaix reviewed Feb 1, 2024

View reviewed changes

AlexKoff88 added 3 commits February 2, 2024 14:18

Fixed issues with models larger than 1B. Added tests.

4c821ad

Style

9943624

Fixed issues. Applied comments.

b555a67

AlexKoff88 mentioned this pull request Feb 5, 2024

Fix openvino/test_training.py #548

Merged

AlexKoff88 added 3 commits February 5, 2024 13:07

Merge branch 'main' into ak/load_in_4bit_alt

9e108d7

Removed unnecessary exception

55a673b

Merged with main

374b1fc

echarlaix reviewed Feb 5, 2024

View reviewed changes

AlexKoff88 added 2 commits February 5, 2024 16:43

Applied more comments

f67e802

Fixed issue

de4d192

Make quantization_config a part of OVConfig in OVQuantizer

277d39a

AlexKoff88 added 2 commits February 6, 2024 19:37

Fixed issue with Transformers

4707914

Fixed test

ae1da0f

AlexKoff88 commented Feb 7, 2024

View reviewed changes

echarlaix approved these changes Feb 7, 2024

View reviewed changes

AlexKoff88 added 5 commits February 8, 2024 12:08

Changed the naming. Added additional tests

1275d0a

Fixed tests

ed69ff1

Fixed tests

c0e5a1a

Applied more comments

2922841

Style

a7eeeb2

echarlaix merged commit a7b766e into main Feb 8, 2024
10 of 12 checks passed

echarlaix deleted the ak/load_in_4bit_alt branch February 8, 2024 14:14

AlexKoff88 mentioned this pull request Feb 12, 2024

Int4(llama-7b-chat) converted model generates response with German words openvinotoolkit/openvino.genai#207

Closed

BreidyAriasD approved these changes Apr 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_in_4bit option for OVModelForCausalLM #538

load_in_4bit option for OVModelForCausalLM #538

AlexKoff88 commented Jan 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 30, 2024

echarlaix left a comment

echarlaix Feb 1, 2024

AlexKoff88 Feb 5, 2024

echarlaix Feb 5, 2024

AlexKoff88 Feb 6, 2024

echarlaix Feb 5, 2024

AlexKoff88 commented Feb 5, 2024

AlexKoff88 Feb 7, 2024

echarlaix Feb 7, 2024

AlexKoff88 Feb 8, 2024

echarlaix left a comment

echarlaix Feb 7, 2024

AlexKoff88 Feb 8, 2024

echarlaix Feb 8, 2024

echarlaix Feb 7, 2024

AlexKoff88 commented Feb 8, 2024

		quantization_config: QuantizationConfigMixin = None,
		ov_config: OVConfig = None,

load_in_4bit option for OVModelForCausalLM #538

load_in_4bit option for OVModelForCausalLM #538

Conversation

AlexKoff88 commented Jan 30, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jan 30, 2024

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexKoff88 commented Feb 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexKoff88 commented Feb 8, 2024

AlexKoff88 commented Jan 30, 2024 •

edited

Loading