Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export hybrid StableDiffusion models via optimum-cli #618

Merged
merged 8 commits into from
Apr 18, 2024

Conversation

l-bat
Copy link
Contributor

@l-bat l-bat commented Mar 20, 2024

What does this PR do?

  • Extend optimum-cli with dataset option
  • Support export StableDiffusion models after hybrid PTQ via optimum-cli

Example:

optimum-cli export openvino --model SimianLuo/LCM_Dreamshaper_v7 --task latent-consistency  --dataset conceptual_captions --weight-format int8 <output_dir>

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@l-bat l-bat force-pushed the lt/hybrid_quant_cli branch from de7c1a8 to 89b3487 Compare March 20, 2024 10:08
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

and ov_config.quantization_config
and "dataset" in ov_config.quantization_config
):
import huggingface_hub
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix, @eaidova, what do you think about such an approach to provide optimization capabilities for Diffusers in CLI?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to make it available in the CLI, but would prefer to not have it included in main_export (which should be reserved for the model loading + export steps) but directly in the quantizer (that could be called after main_export in https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion :

  • optimum.exporters.openvino should have everything export related
  • optimum.intel.openvino.quantization should have everything quantization related (potentially calling export functions from optimum.exporters.openvino when the model to quantize is a torch.nn.Module), also should handle everything related to OVModel quantization (for example I think
    if load_in_4bit:
    if not is_nncf_available():
    raise ImportError(
    "Quantization of the weights requires nncf, please install it with `pip install nncf`"
    )
    import nncf
    from .quantization import _weight_only_quantization
    default_config = _check_default_4bit_configs(config)
    if default_config:
    logger.info(
    f"For the given model, we recommend the following `quantization_config` : {default_config}"
    )
    if isinstance(quantization_config.dataset, str):
    tokenizer = quantization_config.tokenizer or AutoTokenizer.from_pretrained(model_id)
    from optimum.gptq.data import get_dataset, prepare_dataset
    # from optimum.gptq.utils import get_seqlen
    # seqlen = get_seqlen(causal_model)
    nsamples = quantization_config.num_samples if quantization_config.num_samples else 128
    dataset = get_dataset(quantization_config.dataset, tokenizer, seqlen=32, nsamples=nsamples)
    dataset = prepare_dataset(dataset)
    quantization_config = copy.deepcopy(quantization_config)
    quantization_config.dataset = nncf.Dataset(dataset, lambda x: causal_model.prepare_inputs(**x))
    _weight_only_quantization(model, quantization_config)
    should be moved to the OVQuantizer)
    what do you think ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this point but we need a way to call both export and optimization in CLI.

should be moved to the OVQuantizer)
@echarlaix, this is what @nikita-savelyevv started working on according to our recent agreements with you. We will create a separate PR for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes do you think we can first merge the refactorization PR as adding more features (like support of hybrid quantization through the CLI) will make the refactorization even more complex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case I don't think we should create an instance of OVModel inside main_export (which itself will call main_export), it also results in the model be loaded twice. It would be easier to have it there : https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py#in my opinion

Copy link
Collaborator

@eaidova eaidova Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about moving conversion of sd from export_main into separated function and calling quantization after conversion when models saved on disk happens if config provided?

similar way used for weights compression, but weights compression applied on single model and we do not need to have whole pipeline for data collection, that is why I think it is better to separate sd models conversion to know where pipeline conversion finished. additionally I think it may help in this refactoring as it may be used inside quantization with more control

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated the model conversion from quantization if a dataset is provided for SD model.

):
import huggingface_hub

model_info = huggingface_hub.model_info(model_name_or_path, revision=revision)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it is a local directory instead of published on hub model? Will this work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this will not work for model hosted locally, could we load the model's config instead https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/model_index.json ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use model.__class__.__name__, where model = TasksManager.get_model_from_task(..)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that could be used to infer the task given a model id of a model hosted on the hub is to use TasksManager.infer_task_from_model https://github.com/huggingface/optimum/blob/dedb852eaea3d3d899920f4017b9fd899eb2756b/optimum/exporters/tasks.py#L1601. Like this we can move it here as https://github.com/huggingface/optimum-intel/blob/main/optimum/commands/export/openvino.py#L191. For model hosted locally the task needs to be provided https://github.com/huggingface/optimum/blob/dedb852eaea3d3d899920f4017b9fd899eb2756b/optimum/exporters/tasks.py#L1552 (already true for model exported with a CLI without any quantization)

@l-bat
Copy link
Contributor Author

l-bat commented Mar 22, 2024

PR huggingface/optimum#1762 should be merged to fix OpenVINO build

@l-bat l-bat force-pushed the lt/hybrid_quant_cli branch from c98e092 to 4afbcfd Compare April 4, 2024 09:56
@l-bat l-bat requested a review from echarlaix April 4, 2024 09:56
@echarlaix
Copy link
Collaborator

Opened l-bat#1 to infer the task by loading the diffusers configuration directly, let me know what you think @l-bat @AlexKoff88 @eaidova

@echarlaix
Copy link
Collaborator

Also to fix the code style test you can do the following :

pip install .[quality]
make style

@l-bat
Copy link
Contributor Author

l-bat commented Apr 15, 2024

Opened l-bat#1 to infer the task by loading the diffusers configuration directly, let me know what you think @l-bat @AlexKoff88 @eaidova

I'm ok with proposed changes, but I want to highlight following points:

  1. If a dataset is provided, then task is an optional parameter. Otherwise, the user must specify the task to avoid error:
KeyError: "The task could not be automatically inferred. Please provide the argument --task with the relevant task from object-detection, image-to-image, image-classification, audio-xvector, latent-consistency, text-generation, text2text-generation, audio-classification, stable-diffusion, stable-diffusion-xl-refiner, question-answering, stable-diffusion-xl, feature-extraction, automatic-speech-recognition, semantic-segmentation, depth-estimation, image-segmentation, masked-im, sentence-similarity, text-classification, token-classification, mask-generation, image-to-text, zero-shot-image-classification, zero-shot-object-detection, text-to-audio, conversational, multiple-choice, audio-frame-classification, fill-mask. Detailed error: 'class_name'"
  1. There is a problem exporting stabilityai/stable-diffusion-xl-base-1.0 via CLI due to incorrect task mapping in optimum:
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0   output_dir --task stable-diffusion-xl
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2304 and 2816x1280)

@echarlaix
Copy link
Collaborator

@l-bat do you observe both error using optimum v1.18.1 ?

@l-bat
Copy link
Contributor Author

l-bat commented Apr 16, 2024

@l-bat do you observe both error using optimum v1.18.1 ?

No, with v1.18.0. It works with v1.18.1. Should I close this PR or apply your changes?

@echarlaix
Copy link
Collaborator

echarlaix commented Apr 16, 2024

@l-bat do you observe both error using optimum v1.18.1 ?

No, with v1.18.0. It works with v1.18.1. Should I close this PR or apply your changes?

Great to hear, I think you can close huggingface/optimum#1762 and apply the changes from l-bat#1 in this PR if that works for you!

@l-bat l-bat force-pushed the lt/hybrid_quant_cli branch from 4afbcfd to 27405ed Compare April 16, 2024 15:46
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can merge once all tests are passing. To fix the code style test you can do :

pip install .[quality]
make style

)
library_name = TasksManager.infer_library_from_model(self.args.model)

if library_name == "diffusers" and ov_config and ov_config.quantization_config.get("dataset"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failing tests can be fiwing by checking for when ov_config.quantization_config is None here

@l-bat l-bat force-pushed the lt/hybrid_quant_cli branch from 032e91d to a42de51 Compare April 18, 2024 09:21
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work @l-bat !

@echarlaix echarlaix merged commit 4651ac2 into huggingface:main Apr 18, 2024
10 checks passed
@echarlaix echarlaix mentioned this pull request Apr 18, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants