-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export hybrid StableDiffusion models via optimum-cli #618
Conversation
de7c1a8
to
89b3487
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
and ov_config.quantization_config | ||
and "dataset" in ov_config.quantization_config | ||
): | ||
import huggingface_hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echarlaix, @eaidova, what do you think about such an approach to provide optimization capabilities for Diffusers in CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to make it available in the CLI, but would prefer to not have it included in main_export
(which should be reserved for the model loading + export steps) but directly in the quantizer (that could be called after main_export
in https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion :
optimum.exporters.openvino
should have everything export relatedoptimum.intel.openvino.quantization
should have everything quantization related (potentially calling export functions fromoptimum.exporters.openvino
when the model to quantize is atorch.nn.Module
), also should handle everything related to OVModel quantization (for example I thinkoptimum-intel/optimum/intel/openvino/modeling_decoder.py
Lines 616 to 646 in 086fae3
if load_in_4bit: if not is_nncf_available(): raise ImportError( "Quantization of the weights requires nncf, please install it with `pip install nncf`" ) import nncf from .quantization import _weight_only_quantization default_config = _check_default_4bit_configs(config) if default_config: logger.info( f"For the given model, we recommend the following `quantization_config` : {default_config}" ) if isinstance(quantization_config.dataset, str): tokenizer = quantization_config.tokenizer or AutoTokenizer.from_pretrained(model_id) from optimum.gptq.data import get_dataset, prepare_dataset # from optimum.gptq.utils import get_seqlen # seqlen = get_seqlen(causal_model) nsamples = quantization_config.num_samples if quantization_config.num_samples else 128 dataset = get_dataset(quantization_config.dataset, tokenizer, seqlen=32, nsamples=nsamples) dataset = prepare_dataset(dataset) quantization_config = copy.deepcopy(quantization_config) quantization_config.dataset = nncf.Dataset(dataset, lambda x: causal_model.prepare_inputs(**x)) _weight_only_quantization(model, quantization_config) OVQuantizer
)
what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this point but we need a way to call both export and optimization in CLI.
should be moved to the OVQuantizer)
@echarlaix, this is what @nikita-savelyevv started working on according to our recent agreements with you. We will create a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes do you think we can first merge the refactorization PR as adding more features (like support of hybrid quantization through the CLI) will make the refactorization even more complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case I don't think we should create an instance of OVModel
inside main_export
(which itself will call main_export
), it also results in the model be loaded twice. It would be easier to have it there : https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py#in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about moving conversion of sd from export_main into separated function and calling quantization after conversion when models saved on disk happens if config provided?
similar way used for weights compression, but weights compression applied on single model and we do not need to have whole pipeline for data collection, that is why I think it is better to separate sd models conversion to know where pipeline conversion finished. additionally I think it may help in this refactoring as it may be used inside quantization with more control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated the model conversion from quantization if a dataset is provided for SD model.
): | ||
import huggingface_hub | ||
|
||
model_info = huggingface_hub.model_info(model_name_or_path, revision=revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it is a local directory instead of published on hub model? Will this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this will not work for model hosted locally, could we load the model's config instead https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/model_index.json ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use model.__class__.__name__
, where model = TasksManager.get_model_from_task(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that could be used to infer the task given a model id of a model hosted on the hub is to use TasksManager.infer_task_from_model
https://github.com/huggingface/optimum/blob/dedb852eaea3d3d899920f4017b9fd899eb2756b/optimum/exporters/tasks.py#L1601. Like this we can move it here as https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py#L191. For model hosted locally the task needs to be provided https://github.com/huggingface/optimum/blob/dedb852eaea3d3d899920f4017b9fd899eb2756b/optimum/exporters/tasks.py#L1552 (already true for model exported with a CLI without any quantization)
PR huggingface/optimum#1762 should be merged to fix OpenVINO build |
c98e092
to
4afbcfd
Compare
Opened l-bat#1 to infer the task by loading the diffusers configuration directly, let me know what you think @l-bat @AlexKoff88 @eaidova |
Also to fix the code style test you can do the following :
|
I'm ok with proposed changes, but I want to highlight following points:
KeyError: "The task could not be automatically inferred. Please provide the argument --task with the relevant task from object-detection, image-to-image, image-classification, audio-xvector, latent-consistency, text-generation, text2text-generation, audio-classification, stable-diffusion, stable-diffusion-xl-refiner, question-answering, stable-diffusion-xl, feature-extraction, automatic-speech-recognition, semantic-segmentation, depth-estimation, image-segmentation, masked-im, sentence-similarity, text-classification, token-classification, mask-generation, image-to-text, zero-shot-image-classification, zero-shot-object-detection, text-to-audio, conversational, multiple-choice, audio-frame-classification, fill-mask. Detailed error: 'class_name'"
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 output_dir --task stable-diffusion-xl
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2304 and 2816x1280) |
@l-bat do you observe both error using optimum |
No, with |
Great to hear, I think you can close huggingface/optimum#1762 and apply the changes from l-bat#1 in this PR if that works for you! |
4afbcfd
to
27405ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we can merge once all tests are passing. To fix the code style test you can do :
pip install .[quality]
make style
optimum/commands/export/openvino.py
Outdated
) | ||
library_name = TasksManager.infer_library_from_model(self.args.model) | ||
|
||
if library_name == "diffusers" and ov_config and ov_config.quantization_config.get("dataset"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failing tests can be fiwing by checking for when ov_config.quantization_config
is None here
032e91d
to
a42de51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work @l-bat !
What does this PR do?
dataset
optionExample:
Before submitting