[OV] Move data-driven quantization after model export for text-generation models #721

nikita-savelyevv · 2024-05-21T13:09:55Z

What does this PR do?

In order to apply data-driven weights compression, an instance of OVModelForCausalLM is required. It however is not available during quantization applied at model export (here).

That's why in this PR some logic is added so that such case is processed separately after model is exported. This results in some save/load overhead, but compared to runtime of data-drive weights compression it should be negligible. Worth to note, data-free compression is still applied during export resulting in no additional overhead.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

…models

nikita-savelyevv · 2024-05-21T13:11:44Z

optimum/commands/export/openvino.py

+                # TODO: set save_directory=self.args.output once OV is updated to 2024.3
+                quantizer.quantize(ov_config=OVConfig(quantization_config=quantization_config))
+                with tempfile.TemporaryDirectory() as temp_dir:
+                    import shutil
+
+                    model.save_pretrained(temp_dir)
+                    ov_config.save_pretrained(self.args.output)
+                    shutil.copy(f"{temp_dir}/openvino_model.xml", f"{self.args.output}/openvino_model.xml")
+                    shutil.copy(f"{temp_dir}/openvino_model.bin", f"{self.args.output}/openvino_model.bin")


Had to add this workaround because OpenVINO does not currently support saving into the same location where the model is loaded from (ticket 110054). This is expected to be fixed in OV 2024.3.

@eaidova, please take a look

maybe we can intorduce model_name parameter for from_pretrained/save_pretrained methods? That will allow having both models in the same dir (in the same time it maybe useful for loading model if IR saved by different tools or renamed). Or we can try disable mmap via ov_config for now (it should help with saving in the same location)

@eaidova thank you for your suggestion! I've replaced saving to temporary directory with disabling mmap

@eaidova thank you for your suggestion! I've replaced saving to temporary directory with disabling mmap

For some reason when doing it this way I observe that a significant amount of additional memory is allocated. The amount roughly equals the model size which is rather significant. I guess I'll revert these changes for now.

HuggingFaceDocBuilderDev · 2024-05-21T13:42:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

nikita-savelyevv · 2024-05-22T08:18:37Z

@AlexKoff88 please take a look

AlexKoff88 · 2024-05-23T06:05:10Z

optimum/commands/export/openvino.py

+                    import shutil
+
+                    model.save_pretrained(temp_dir)
+                    ov_config.save_pretrained(self.args.output)


@nikita-savelyevv, does this workaround with tmp folder mean that we cannot save the model at the same path but can copy files there? Looks a bit strange.

Yes, I had troubles saving to the same location, but copying to that location works fine

…or OVModelForCausalLM

…ndling for OVModelForCausalLM" This reverts commit bcc4665.

…model if compression fails

tests/openvino/test_exporters_cli.py

echarlaix · 2024-05-29T22:40:48Z

optimum/commands/export/openvino.py

+            if quantize_after_export:
+                try:
+                    from optimum.intel import OVModelForCausalLM, OVQuantizer
+
+                    ov_config.dtype = original_dtype_value
+                    model = OVModelForCausalLM.from_pretrained(
+                        self.args.output, trust_remote_code=self.args.trust_remote_code
+                    )
+                    quantizer = OVQuantizer(model)
+                    quantization_config.tokenizer = quantization_config.tokenizer or str(self.args.output)
+                    # TODO: set save_directory=self.args.output once OV is updated to 2024.3
+                    quantizer.quantize(ov_config=OVConfig(quantization_config=quantization_config))
+                    with tempfile.TemporaryDirectory() as temp_dir:
+                        model.save_pretrained(temp_dir)
+                        ov_config.save_pretrained(self.args.output)
+                        shutil.copy(f"{temp_dir}/openvino_model.xml", f"{self.args.output}/openvino_model.xml")
+                        shutil.copy(f"{temp_dir}/openvino_model.bin", f"{self.args.output}/openvino_model.bin")
+                except Exception as e:
+                    # Delete non-compressed model if compression failed for some reason
+                    shutil.rmtree(str(self.args.output))
+                    raise e


Why not exporting + applying quantization using OVModelForCausalLM directly (and not calling main_export for this specific case) ?

If possible then it could actually make sense to do this for all models (as it's already the case for SD models)

Why not exporting + applying quantization using OVModelForCausalLM directly (and not calling main_export for this specific case) ?

Thanks for the suggestion! This is indeed better.

If possible then it could actually make sense to do this for all models (as it's already the case for SD models)

It would be more convenient from code maintenance side. But compared to calling main_export() with provided quantization_config, when we export and quantize through .from_pretrained() there is an additional model saving step to temporary directory which leads to an additional overhead, especially for large models. I think we should if avoid this if possible.

optimum/commands/export/openvino.py

echarlaix

Looks good, thanks @nikita-savelyevv

optimum/commands/export/openvino.py

nikita-savelyevv · 2024-06-04T16:27:40Z

optimum/commands/export/openvino.py

+            tokenizer = None
+            try:
+                from transformers import AutoTokenizer
+
+                tokenizer = AutoTokenizer.from_pretrained(
+                    self.args.model, trust_remote_code=self.args.trust_remote_code
+                )
+                tokenizer.save_pretrained(self.args.output)
+            except Exception:
+                logger.warning("Could not save tokenizer")
+
+            if tokenizer and not self.args.disable_convert_tokenizer:
+                from ...exporters.openvino.convert import export_tokenizer
+
+                export_tokenizer(tokenizer, self.args.output / "tokenizer")


Had to add such logic because otherwise tokenizer ends up being saved to a temporary directory.

to simplify the code a bit what do you think about calling directly maybe_load_preprocessors

optimum-intel/optimum/exporters/openvino/__main__.py

Line 344 in 8dd2a89

preprocessors = maybe_load_preprocessors(

and to save the tokenizer we can also use maybe_save_preprocessors

optimum-intel/optimum/exporters/openvino/convert.py

Line 625 in 8dd2a89

maybe_save_preprocessors(model_name_or_path, output, trust_remote_code=trust_remote_code)

also we could create a function

optimum-intel/optimum/exporters/openvino/__main__.py

Lines 364 to 390 in 8dd2a89

from optimum.exporters.openvino.convert import export_tokenizer

if convert_tokenizer and is_openvino_tokenizers_available():

if library_name != "diffusers":

tokenizer = next(

(preprocessor for preprocessor in preprocessors if isinstance(preprocessor, PreTrainedTokenizerBase)),

None,

)

if tokenizer is not None:

try:

export_tokenizer(tokenizer, output)

except Exception as exception:

logger.warning(

"Could not load tokenizer using specified model ID or path. OpenVINO tokenizer/detokenizer "

f"models won't be generated. Exception: {exception}"

)

else:

tokenizer = getattr(model, "tokenizer", None)

if tokenizer is not None:

export_tokenizer(tokenizer, output / "tokenizer")

tokenizer_2 = getattr(model, "tokenizer_2", None)

if tokenizer_2 is not None:

export_tokenizer(tokenizer_2, output / "tokenizer_2")

elif convert_tokenizer and not is_openvino_tokenizers_available():

logger.warning("Tokenizer won't be converted.")

that could be re-used here

Thanks! Done

echarlaix · 2024-06-05T10:35:42Z

optimum/commands/export/openvino.py

@@ -218,6 +249,10 @@ def run(self):
                    "sym": self.args.sym or False,
                    "group_size": -1 if is_int8 else self.args.group_size,
                    "all_layers": None if is_int8 else self.args.all_layers,
+                    "dataset": self.args.dataset,
+                    "num_samples": self.args.num_samples,
+                    "quant_method": "awq" if self.args.awq else None,


not sure if it should be QuantizationMethod.AWQ instead of "awq" or if the configuration takes care of this

optimum-intel/optimum/intel/openvino/quantization.py

Line 822 in 52875b9

awq=config.quant_method == QuantizationMethod.AWQ or None,

I hesitated to do it this way because it would require to introduce dependency on transformers in this file in order to import QuantizationMethod. But now I see that transformers is a general requirement of optimum so it should be fine I guess.

echarlaix · 2024-06-05T10:49:03Z

optimum/commands/export/openvino.py

+            tokenizer = None
+            try:
+                from transformers import AutoTokenizer
+
+                tokenizer = AutoTokenizer.from_pretrained(
+                    self.args.model, trust_remote_code=self.args.trust_remote_code
+                )
+                tokenizer.save_pretrained(self.args.output)
+            except Exception:
+                logger.warning("Could not save tokenizer")
+
+            if tokenizer and not self.args.disable_convert_tokenizer:
+                from ...exporters.openvino.convert import export_tokenizer
+
+                export_tokenizer(tokenizer, self.args.output / "tokenizer")


to simplify the code a bit what do you think about calling directly maybe_load_preprocessors

optimum-intel/optimum/exporters/openvino/__main__.py

Line 344 in 8dd2a89

preprocessors = maybe_load_preprocessors(

and to save the tokenizer we can also use maybe_save_preprocessors

optimum-intel/optimum/exporters/openvino/convert.py

Line 625 in 8dd2a89

maybe_save_preprocessors(model_name_or_path, output, trust_remote_code=trust_remote_code)

also we could create a function

optimum-intel/optimum/exporters/openvino/__main__.py

Lines 364 to 390 in 8dd2a89

from optimum.exporters.openvino.convert import export_tokenizer

if convert_tokenizer and is_openvino_tokenizers_available():

if library_name != "diffusers":

tokenizer = next(

(preprocessor for preprocessor in preprocessors if isinstance(preprocessor, PreTrainedTokenizerBase)),

None,

)

if tokenizer is not None:

try:

export_tokenizer(tokenizer, output)

except Exception as exception:

logger.warning(

"Could not load tokenizer using specified model ID or path. OpenVINO tokenizer/detokenizer "

f"models won't be generated. Exception: {exception}"

)

else:

tokenizer = getattr(model, "tokenizer", None)

if tokenizer is not None:

export_tokenizer(tokenizer, output / "tokenizer")

tokenizer_2 = getattr(model, "tokenizer_2", None)

if tokenizer_2 is not None:

export_tokenizer(tokenizer_2, output / "tokenizer_2")

elif convert_tokenizer and not is_openvino_tokenizers_available():

logger.warning("Tokenizer won't be converted.")

that could be re-used here

echarlaix

Looks great, thanks a lot @nikita-savelyevv

Add quantization with dataset after model export for text-generation …

56878bb

…models

nikita-savelyevv commented May 21, 2024

View reviewed changes

nikita-savelyevv added 2 commits May 21, 2024 15:22

Tweak AWQ CLI interface

013a0f6

Additional checks

c566ccc

nikita-savelyevv added 2 commits May 21, 2024 16:30

Fix

0a8fba0

Trigger Build

6dbb4fe

nikita-savelyevv added 4 commits May 22, 2024 15:43

Add AWQ description

3722624

Add trust remote code argument

dee582d

Black

a44c096

Add note about possibility of skipping AWQ

12dc672

AlexKoff88 assigned echarlaix May 23, 2024

AlexKoff88 reviewed May 23, 2024

View reviewed changes

nikita-savelyevv force-pushed the cli-awq branch from 5323dee to bcc4665 Compare May 23, 2024 11:39

nikita-savelyevv added 2 commits May 23, 2024 13:40

Removed saving to temporary directory; added core property handling f…

bcc4665

…or OVModelForCausalLM

Revert "Removed saving to temporary directory; added core property ha…

40058da

…ndling for OVModelForCausalLM" This reverts commit bcc4665.

AlexKoff88 approved these changes May 23, 2024

View reviewed changes

nikita-savelyevv added 4 commits May 23, 2024 14:15

Add saving intermediate weights in fp16; add removal of intermediate …

0886f7e

…model if compression fails

Trigger checks

ee9b1b7

Trigger checks

cb57068

Trigger checks

ee0b67f

echarlaix reviewed May 29, 2024

View reviewed changes

nikita-savelyevv added 2 commits May 31, 2024 12:00

Fix test

cacbb36

Refactor applying quantization with dataset

814d96c

nikita-savelyevv force-pushed the cli-awq branch from beaf849 to d8017ab Compare May 31, 2024 11:04

Bring back quantization_config parameter

d8017ab

nikita-savelyevv commented May 31, 2024

View reviewed changes

optimum/commands/export/openvino.py Show resolved Hide resolved

Trigger checks

24272dc

nikita-savelyevv requested a review from echarlaix June 3, 2024 12:15

echarlaix approved these changes Jun 3, 2024

View reviewed changes

optimum/commands/export/openvino.py Outdated Show resolved Hide resolved

Apply comment

40b0e29

nikita-savelyevv marked this pull request as draft June 4, 2024 12:00

nikita-savelyevv added 6 commits June 4, 2024 16:01

Save tokenizer

f54aa40

Export CausalLM tokenizer

96bed29

Remove unneccessary if

a6005ad

Remove extra variable

e311916

ruff

fc44214

Ruff 2

709085b

nikita-savelyevv marked this pull request as ready for review June 4, 2024 16:26

nikita-savelyevv commented Jun 4, 2024

View reviewed changes

echarlaix reviewed Jun 5, 2024

View reviewed changes

nikita-savelyevv added 2 commits June 5, 2024 13:52

Introduce a separate function to tokenizer conversion

a2084d9

Black

e8cc0e9

echarlaix approved these changes Jun 6, 2024

View reviewed changes

Merge branch 'main' into cli-awq

6815773

echarlaix merged commit 6888c0a into huggingface:main Jun 6, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OV] Move data-driven quantization after model export for text-generation models #721

[OV] Move data-driven quantization after model export for text-generation models #721

nikita-savelyevv commented May 21, 2024

nikita-savelyevv May 21, 2024

AlexKoff88 May 22, 2024

eaidova May 23, 2024

nikita-savelyevv May 23, 2024

nikita-savelyevv May 23, 2024

HuggingFaceDocBuilderDev commented May 21, 2024

nikita-savelyevv commented May 22, 2024

AlexKoff88 May 23, 2024

nikita-savelyevv May 23, 2024

echarlaix May 29, 2024

echarlaix May 29, 2024

nikita-savelyevv May 31, 2024

echarlaix left a comment

nikita-savelyevv Jun 4, 2024

echarlaix Jun 5, 2024

nikita-savelyevv Jun 5, 2024

echarlaix Jun 5, 2024

nikita-savelyevv Jun 5, 2024 •

edited

Loading

echarlaix Jun 5, 2024

echarlaix left a comment

	from optimum.exporters.openvino.convert import export_tokenizer

	if convert_tokenizer and is_openvino_tokenizers_available():
	if library_name != "diffusers":
	tokenizer = next(
	(preprocessor for preprocessor in preprocessors if isinstance(preprocessor, PreTrainedTokenizerBase)),
	None,
	)

	if tokenizer is not None:
	try:
	export_tokenizer(tokenizer, output)
	except Exception as exception:
	logger.warning(
	"Could not load tokenizer using specified model ID or path. OpenVINO tokenizer/detokenizer "
	f"models won't be generated. Exception: {exception}"
	)
	else:
	tokenizer = getattr(model, "tokenizer", None)
	if tokenizer is not None:
	export_tokenizer(tokenizer, output / "tokenizer")

	tokenizer_2 = getattr(model, "tokenizer_2", None)
	if tokenizer_2 is not None:
	export_tokenizer(tokenizer_2, output / "tokenizer_2")
	elif convert_tokenizer and not is_openvino_tokenizers_available():
	logger.warning("Tokenizer won't be converted.")

[OV] Move data-driven quantization after model export for text-generation models #721

[OV] Move data-driven quantization after model export for text-generation models #721

Conversation

nikita-savelyevv commented May 21, 2024

What does this PR do?

Before submitting

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented May 21, 2024

nikita-savelyevv commented May 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikita-savelyevv Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

nikita-savelyevv Jun 5, 2024 •

edited

Loading