Export hybrid StableDiffusion models via optimum-cli #618

l-bat · 2024-03-20T10:03:26Z

What does this PR do?

Extend optimum-cli with dataset option
Support export StableDiffusion models after hybrid PTQ via optimum-cli

Example:

optimum-cli export openvino --model SimianLuo/LCM_Dreamshaper_v7 --task latent-consistency  --dataset conceptual_captions --weight-format int8 <output_dir>

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-03-20T10:08:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/exporters/openvino/__main__.py

AlexKoff88 · 2024-03-20T15:01:37Z

optimum/exporters/openvino/__main__.py

+        and ov_config.quantization_config
+        and "dataset" in ov_config.quantization_config
+    ):
+        import huggingface_hub


@echarlaix, @eaidova, what do you think about such an approach to provide optimization capabilities for Diffusers in CLI?

Great to make it available in the CLI, but would prefer to not have it included in main_export (which should be reserved for the model loading + export steps) but directly in the quantizer (that could be called after main_export in https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py)

In my opinion :

optimum.exporters.openvino should have everything export related

optimum.intel.openvino.quantization should have everything quantization related (potentially calling export functions from optimum.exporters.openvino when the model to quantize is a torch.nn.Module), also should handle everything related to OVModel quantization (for example I think

optimum-intel/optimum/intel/openvino/modeling_decoder.py

Lines 616 to 646 in 086fae3

if load_in_4bit:

if not is_nncf_available():

raise ImportError(

"Quantization of the weights requires nncf, please install it with `pip install nncf`"

)

import nncf

from .quantization import _weight_only_quantization

default_config = _check_default_4bit_configs(config)

if default_config:

logger.info(

f"For the given model, we recommend the following `quantization_config` : {default_config}"

)

if isinstance(quantization_config.dataset, str):

tokenizer = quantization_config.tokenizer or AutoTokenizer.from_pretrained(model_id)

from optimum.gptq.data import get_dataset, prepare_dataset

# from optimum.gptq.utils import get_seqlen

# seqlen = get_seqlen(causal_model)

nsamples = quantization_config.num_samples if quantization_config.num_samples else 128

dataset = get_dataset(quantization_config.dataset, tokenizer, seqlen=32, nsamples=nsamples)

dataset = prepare_dataset(dataset)

quantization_config = copy.deepcopy(quantization_config)

quantization_config.dataset = nncf.Dataset(dataset, lambda x: causal_model.prepare_inputs(**x))

_weight_only_quantization(model, quantization_config)

should be moved to the OVQuantizer)
what do you think ?

I agree with this point but we need a way to call both export and optimization in CLI.

should be moved to the OVQuantizer)
@echarlaix, this is what @nikita-savelyevv started working on according to our recent agreements with you. We will create a separate PR for that.

yes do you think we can first merge the refactorization PR as adding more features (like support of hybrid quantization through the CLI) will make the refactorization even more complex

In any case I don't think we should create an instance of OVModel inside main_export (which itself will call main_export), it also results in the model be loaded twice. It would be easier to have it there : https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py#in my opinion

what about moving conversion of sd from export_main into separated function and calling quantization after conversion when models saved on disk happens if config provided?

similar way used for weights compression, but weights compression applied on single model and we do not need to have whole pipeline for data collection, that is why I think it is better to separate sd models conversion to know where pipeline conversion finished. additionally I think it may help in this refactoring as it may be used inside quantization with more control

I separated the model conversion from quantization if a dataset is provided for SD model.

eaidova · 2024-03-21T05:54:44Z

optimum/exporters/openvino/__main__.py

+    ):
+        import huggingface_hub
+
+        model_info = huggingface_hub.model_info(model_name_or_path, revision=revision)


what if it is a local directory instead of published on hub model? Will this work?

No this will not work for model hosted locally, could we load the model's config instead https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/model_index.json ?

I can use model.__class__.__name__, where model = TasksManager.get_model_from_task(..)

Something that could be used to infer the task given a model id of a model hosted on the hub is to use TasksManager.infer_task_from_model https://github.com/huggingface/optimum/blob/dedb852eaea3d3d899920f4017b9fd899eb2756b/optimum/exporters/tasks.py#L1601. Like this we can move it here as https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py#L191. For model hosted locally the task needs to be provided https://github.com/huggingface/optimum/blob/dedb852eaea3d3d899920f4017b9fd899eb2756b/optimum/exporters/tasks.py#L1552 (already true for model exported with a CLI without any quantization)

l-bat · 2024-03-22T09:36:21Z

PR huggingface/optimum#1762 should be merged to fix OpenVINO build

echarlaix · 2024-04-10T13:46:39Z

Opened l-bat#1 to infer the task by loading the diffusers configuration directly, let me know what you think @l-bat @AlexKoff88 @eaidova

echarlaix · 2024-04-11T09:01:27Z

Also to fix the code style test you can do the following :

pip install .[quality]
make style

l-bat · 2024-04-15T12:38:03Z

Opened l-bat#1 to infer the task by loading the diffusers configuration directly, let me know what you think @l-bat @AlexKoff88 @eaidova

I'm ok with proposed changes, but I want to highlight following points:

If a dataset is provided, then task is an optional parameter. Otherwise, the user must specify the task to avoid error:

KeyError: "The task could not be automatically inferred. Please provide the argument --task with the relevant task from object-detection, image-to-image, image-classification, audio-xvector, latent-consistency, text-generation, text2text-generation, audio-classification, stable-diffusion, stable-diffusion-xl-refiner, question-answering, stable-diffusion-xl, feature-extraction, automatic-speech-recognition, semantic-segmentation, depth-estimation, image-segmentation, masked-im, sentence-similarity, text-classification, token-classification, mask-generation, image-to-text, zero-shot-image-classification, zero-shot-object-detection, text-to-audio, conversational, multiple-choice, audio-frame-classification, fill-mask. Detailed error: 'class_name'"

There is a problem exporting stabilityai/stable-diffusion-xl-base-1.0 via CLI due to incorrect task mapping in optimum:

optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0   output_dir --task stable-diffusion-xl
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2304 and 2816x1280)

echarlaix · 2024-04-16T10:35:19Z

@l-bat do you observe both error using optimum v1.18.1 ?

l-bat · 2024-04-16T12:29:51Z

@l-bat do you observe both error using optimum v1.18.1 ?

No, with v1.18.0. It works with v1.18.1. Should I close this PR or apply your changes?

echarlaix · 2024-04-16T14:09:19Z

@l-bat do you observe both error using optimum v1.18.1 ?

No, with v1.18.0. It works with v1.18.1. Should I close this PR or apply your changes?

Great to hear, I think you can close huggingface/optimum#1762 and apply the changes from l-bat#1 in this PR if that works for you!

echarlaix

LGTM, we can merge once all tests are passing. To fix the code style test you can do :

pip install .[quality]
make style

README.md

echarlaix · 2024-04-17T13:01:33Z

optimum/commands/export/openvino.py

-        )
+        library_name = TasksManager.infer_library_from_model(self.args.model)
+
+        if library_name == "diffusers" and ov_config and ov_config.quantization_config.get("dataset"):


failing tests can be fiwing by checking for when ov_config.quantization_config is None here

echarlaix

Thanks for your work @l-bat !

l-bat force-pushed the lt/hybrid_quant_cli branch from de7c1a8 to 89b3487 Compare March 20, 2024 10:08

AlexKoff88 reviewed Mar 20, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

AlexKoff88 reviewed Mar 20, 2024

View reviewed changes

eaidova reviewed Mar 21, 2024

View reviewed changes

l-bat mentioned this pull request Mar 21, 2024

Add LCM to DIFFUSERS_TASKS_TO_MODEL_LOADERS huggingface/optimum#1762

Closed

3 tasks

l-bat force-pushed the lt/hybrid_quant_cli branch from c98e092 to 4afbcfd Compare April 4, 2024 09:56

l-bat requested a review from echarlaix April 4, 2024 09:56

l-bat force-pushed the lt/hybrid_quant_cli branch from 4afbcfd to 27405ed Compare April 16, 2024 15:46

echarlaix approved these changes Apr 17, 2024

View reviewed changes

README.md Show resolved Hide resolved

echarlaix reviewed Apr 17, 2024

View reviewed changes

l-bat and others added 7 commits April 17, 2024 15:47

Export hybrid StableDiffusion models via optimum-cli

315ff19

Add doc and test

c33d62a

Remove huggingface_hub

768364a

remove quantization from main_export

2f2ce9b

remove unused function

30c4a96

Infer task by loading the diffusers config

13e44b0

Fix style

6f28328

l-bat force-pushed the lt/hybrid_quant_cli branch from 032e91d to a42de51 Compare April 18, 2024 09:21

echarlaix approved these changes Apr 18, 2024

View reviewed changes

echarlaix merged commit 4651ac2 into huggingface:main Apr 18, 2024
10 checks passed

echarlaix mentioned this pull request Apr 18, 2024

Convert Tokenizers By Default #580

Merged

3 tasks

fix tests

a42de51

nikita-savelyevv mentioned this pull request Apr 23, 2024

Refactor OV weight compression call inside from_pretrained #683

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export hybrid StableDiffusion models via optimum-cli #618

Export hybrid StableDiffusion models via optimum-cli #618

l-bat commented Mar 20, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 20, 2024

AlexKoff88 Mar 20, 2024

echarlaix Mar 21, 2024

echarlaix Mar 21, 2024

AlexKoff88 Mar 21, 2024

echarlaix Mar 28, 2024

echarlaix Mar 28, 2024

eaidova Mar 29, 2024 •

edited

Loading

l-bat Apr 4, 2024

eaidova Mar 21, 2024

echarlaix Mar 21, 2024

l-bat Mar 22, 2024

echarlaix Mar 28, 2024

l-bat commented Mar 22, 2024

echarlaix commented Apr 10, 2024

echarlaix commented Apr 11, 2024

l-bat commented Apr 15, 2024

echarlaix commented Apr 16, 2024

l-bat commented Apr 16, 2024

echarlaix commented Apr 16, 2024 •

edited

Loading

echarlaix left a comment

echarlaix Apr 17, 2024

echarlaix left a comment

	if load_in_4bit:
	if not is_nncf_available():
	raise ImportError(
	"Quantization of the weights requires nncf, please install it with `pip install nncf`"
	)
	import nncf

	from .quantization import _weight_only_quantization

	default_config = _check_default_4bit_configs(config)

	if default_config:
	logger.info(
	f"For the given model, we recommend the following `quantization_config` : {default_config}"
	)

	if isinstance(quantization_config.dataset, str):
	tokenizer = quantization_config.tokenizer or AutoTokenizer.from_pretrained(model_id)

	from optimum.gptq.data import get_dataset, prepare_dataset

	# from optimum.gptq.utils import get_seqlen

	# seqlen = get_seqlen(causal_model)
	nsamples = quantization_config.num_samples if quantization_config.num_samples else 128
	dataset = get_dataset(quantization_config.dataset, tokenizer, seqlen=32, nsamples=nsamples)
	dataset = prepare_dataset(dataset)
	quantization_config = copy.deepcopy(quantization_config)
	quantization_config.dataset = nncf.Dataset(dataset, lambda x: causal_model.prepare_inputs(**x))

	_weight_only_quantization(model, quantization_config)

Export hybrid StableDiffusion models via optimum-cli #618

Export hybrid StableDiffusion models via optimum-cli #618

Conversation

l-bat commented Mar 20, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Mar 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eaidova Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-bat commented Mar 22, 2024

echarlaix commented Apr 10, 2024

echarlaix commented Apr 11, 2024

l-bat commented Apr 15, 2024

echarlaix commented Apr 16, 2024

l-bat commented Apr 16, 2024

echarlaix commented Apr 16, 2024 • edited Loading

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

l-bat commented Mar 20, 2024 •

edited

Loading

eaidova Mar 29, 2024 •

edited

Loading

echarlaix commented Apr 16, 2024 •

edited

Loading