Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and Add Tests #2

Open
wants to merge 76 commits into
base: openvino_tokenizers
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
8340e1d
Fix model dtype (#502)
jiqing-feng Jan 8, 2024
77f9756
Add ipex inference llama test (#503)
jiqing-feng Jan 8, 2024
03e1fa6
Disable marian test until openvino next release (#504)
echarlaix Jan 8, 2024
c64025d
Add INC modeling position_ids generation (#456)
jiqing-feng Jan 8, 2024
aa5b71b
Add f32 precision for compare-with-transformers tests (#508)
helena-intel Jan 10, 2024
ba2487b
Fix typo inside InferRequestWrapper
nikita-savelyevv Jan 11, 2024
23f4f5d
Merge pull request #511 from nikita-savelyevv/fix-infer-request-wrapp…
AlexKoff88 Jan 12, 2024
3f7551e
Add try for get_property (#510)
wgzintel Jan 12, 2024
545ad5a
Fix error with optimum-cli export openvino --help
helena-intel Jan 13, 2024
3c196c3
Merge pull request #514 from huggingface/helena/optimum-cli-fix
AlexKoff88 Jan 15, 2024
133aa7d
Bump min torch version (#515)
echarlaix Jan 15, 2024
7f236c2
Add OPenVINO stateful model support (#493)
eaidova Jan 16, 2024
2f2a764
Add openvino-nightly to automated tests (#506)
helena-intel Jan 16, 2024
e22a2ac
Fix loading Timm models with ov_config (#517)
helena-intel Jan 16, 2024
76ce9de
Use f32 inference for some OpenVINO stable diffusion/training tests (…
helena-intel Jan 17, 2024
94bc226
Convert tokenizers with openvino_tokenizers
slyalin Jan 5, 2024
6bb395f
Update optimum/exporters/openvino/__main__.py
slyalin Jan 5, 2024
7d16ec7
Refactor and Add Tests
apaniukov Jan 9, 2024
f0933ad
Fix t5 Test
apaniukov Jan 10, 2024
24cc616
Add Warning
apaniukov Jan 10, 2024
49337b0
Return Tests
apaniukov Jan 10, 2024
7709043
Move export_tokenizer to convert.py
apaniukov Jan 10, 2024
dbd609b
Avoid Double Tokenizer Save
apaniukov Jan 12, 2024
7e24f10
Fix Style
apaniukov Jan 12, 2024
2cf460d
Refactor After Review
apaniukov Jan 18, 2024
57782d1
Skip Tokenizers Tests If No Package Installed
apaniukov Jan 18, 2024
db668da
Merge branch 'main' into openvino_tokenizers
apaniukov Jan 18, 2024
e00c5fc
Return the original forward after exporting the PyTorch model to Open…
alexsu52 Jan 18, 2024
cbb5fde
Fix Inference docs (#525)
ngaloppo Jan 19, 2024
e7cd70f
Style Fix
apaniukov Jan 19, 2024
40cf117
Fix OV Tokenizers Check
apaniukov Jan 19, 2024
4652ae4
Add warning message when using transformers < 4.35 (#524)
helena-intel Jan 19, 2024
901f48a
Fix Tests
apaniukov Jan 19, 2024
ff1e382
Fix compatibility with transformers (#527)
echarlaix Jan 19, 2024
69140bf
Fix quantization tests for opeenvino-nightly (#523)
eaidova Jan 22, 2024
af2e986
split create pkv to a function (#521)
jiqing-feng Jan 22, 2024
ac640ed
Add test for _print_compiled_model_properties (#528)
helena-intel Jan 23, 2024
672c022
Enable automatic CACHE_DIR for GPU inference only (#520)
helena-intel Jan 23, 2024
1a76bd4
Add Missing return
apaniukov Jan 23, 2024
f2b2237
Turn off tokenizer message if not installed
apaniukov Jan 23, 2024
5e9c1b7
Update OpenVINO documentation about weight compression (#529)
AlexKoff88 Jan 24, 2024
3066ade
Merge branch 'main' into openvino-tokenizers
apaniukov Jan 24, 2024
d96ebfa
Fix ov device (#530)
echarlaix Jan 25, 2024
e0c1143
Fix expected quantization matmul test (#531)
echarlaix Jan 25, 2024
71610dd
Dev version
echarlaix Jan 25, 2024
a622f4d
Fix OVCasualLM model inference without generate (#532)
eaidova Jan 26, 2024
805e737
Add IPEX models (#516)
echarlaix Jan 26, 2024
87b36db
Add IPEX model for question answering (#534)
echarlaix Jan 26, 2024
6bf5fbc
Expose InferRequestWrapper class so it can be imported from elsewhere…
nikita-savelyevv Jan 29, 2024
9a2e271
Merge branch 'main' into openvino-tokenizers
apaniukov Jan 29, 2024
7ee347e
Move tokenizers to OV dependencies
apaniukov Jan 29, 2024
6e79be1
Add IPEX models for audio and image classification tasks (#536)
echarlaix Jan 29, 2024
20df723
relax requirements to have registered normalized config for usage con…
eaidova Jan 30, 2024
8d2ec41
Check OV Compatibility
apaniukov Jan 30, 2024
1b5c3cb
IPEX decoder model fix (#539)
echarlaix Jan 30, 2024
3b627f4
Enable loading of torchscript model with INC and add warning (#540)
echarlaix Jan 30, 2024
32a7274
Bump OV Version
apaniukov Jan 30, 2024
a251422
Fix torch version for ipex tests (#545)
echarlaix Jan 31, 2024
398450d
Refactor IPEX CausalLM for better model architecture scale (#544)
ofirzaf Jan 31, 2024
8ee487d
Automatic `torch.autocast` for IPEXModel (#542)
ofirzaf Jan 31, 2024
788e458
Add an initial warmup step to `IPEXModel`s (#543)
ofirzaf Jan 31, 2024
0ca9447
Fix format (#546)
echarlaix Jan 31, 2024
552de65
Dev version
echarlaix Jan 31, 2024
8c029e0
Move OpenVINO Tokenizers To Optional Dependencies
apaniukov Feb 1, 2024
7ea3656
Fix OV pre-commit test
daniil-lyakhov Feb 1, 2024
24f40bf
CUSTOMIZED_QUANTIZATION_CONFIG is updated
daniil-lyakhov Feb 2, 2024
5120f75
Merge pull request #548 from daniil-lyakhov/dl/fix_ov_precommit
AlexKoff88 Feb 5, 2024
0f45751
Update README (#549)
echarlaix Feb 5, 2024
ad99b98
Add bloom ipex inference test (#551)
echarlaix Feb 6, 2024
24a1e30
Remove pytorch v2.1.2 constraint for tests since ipex v2.2.0 release…
echarlaix Feb 6, 2024
e40e627
Fix openvino export model from ONNX (#554)
echarlaix Feb 7, 2024
09b067f
Add --convert-tokenizer Option to CLI
apaniukov Feb 8, 2024
a7b766e
Add load_in_4bit option for OVModelForCausalLM (#538)
AlexKoff88 Feb 8, 2024
1c14957
Skip automodel compression weights tests for nncf==2.8.0 (#535)
alexsu52 Feb 8, 2024
f3b8ce8
Merge branch 'main' into openvino_tokenizers
apaniukov Feb 8, 2024
3c27fbd
Fix SD Tokenizer
apaniukov Feb 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Skip automodel compression weights tests for nncf==2.8.0 (huggingface…
…#535)

* skip compression weights tests for nncf==2.8.0 and reworked logic of optimization stateful PyTorch models

* black happy

* ruff happy

* updated nncf version

* replied to comments

* replied comments

* typo

* cherry pick fixes for tests from PR 538

* replied to comments

---------

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
alexsu52 and echarlaix authored Feb 8, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 1c1495719b549a5822a2935b7b6853658922a5c1
21 changes: 1 addition & 20 deletions optimum/exporters/openvino/convert.py
Original file line number Diff line number Diff line change
@@ -31,14 +31,7 @@
from optimum.exporters.onnx.model_patcher import DecoderModelPatcher
from optimum.utils import is_diffusers_available

from ...intel.utils.import_utils import (
_torch_version,
_transformers_version,
is_nncf_available,
is_optimum_version,
is_torch_version,
is_transformers_version,
)
from ...intel.utils.import_utils import is_nncf_available, is_optimum_version
from .model_patcher import patch_model_with_bettertransformer
from .stateful import ensure_stateful_is_available, patch_stateful
from .utils import (
@@ -331,18 +324,6 @@ def export_pytorch(
output = Path(output)

if stateful:
if is_transformers_version("<", "4.36") or is_torch_version("<", "2.1.1"):
COLOR_RED = "\033[1;31m"
COLOR_RESET = "\033[0m"
logger.warning(
COLOR_RED
+ "[WARNING] For good performance with stateful models, transformers>=4.36.2 and PyTorch>=2.1.1 are required. "
f"This Python environment has Transformers {_transformers_version} and PyTorch {_torch_version}. "
"Consider upgrading PyTorch and Transformers, for example by running "
"`pip install --upgrade --upgrade-strategy eager optimum[openvino,nncf]`, and export the model again"
+ COLOR_RESET
)

# Trigger bettertransformer together with stateful model because OpenVINO HW-dependent transformations expect
# both of them are applied to demonstrate the best performance.
# TODO: Consider applying bettertransformer regardless of stateful flag -- requires additional validation.
25 changes: 20 additions & 5 deletions optimum/exporters/openvino/model_patcher.py
Original file line number Diff line number Diff line change
@@ -14,16 +14,31 @@

import logging as log

from optimum.intel.utils.import_utils import is_torch_version
from optimum.intel.utils.import_utils import (
_torch_version,
_transformers_version,
is_torch_version,
is_transformers_version,
)


def patch_model_with_bettertransformer(model):
if is_torch_version("<", "2.0"):
# check that the model has not yet been pathced
if hasattr(model, "use_bettertransformer") and model.use_bettertransformer is True:
return model

if is_transformers_version("<", "4.36") or is_torch_version("<", "2.1.1"):
COLOR_RED = "\033[1;31m"
COLOR_RESET = "\033[0m"
log.warn(
"integration Scaled Dot Product Attention optimization supported only with torch > 2.0."
"Usage model with stateful=True may be non-effective if model does not contain torch.functional.scaled_dot_product_attention"
"It is recommended to upgrade PyTorch version for using stateful model or use stateful=False"
COLOR_RED
+ "[WARNING] For good performance with stateful models, transformers>=4.36.2 and PyTorch>=2.1.1 are required. "
f"This Python environment has Transformers {_transformers_version} and PyTorch {_torch_version}. "
"Consider upgrading PyTorch and Transformers, for example by running "
"`pip install --upgrade --upgrade-strategy eager optimum[openvino,nncf]`, and export the model again"
+ COLOR_RESET
)

# model already has required SDPA implementation
if getattr(model, "_supports_sdpa", False) and getattr(model.config, "_attn_implementation", "eager") == "sdpa":
return model
72 changes: 46 additions & 26 deletions optimum/intel/openvino/quantization.py
Original file line number Diff line number Diff line change
@@ -24,22 +24,26 @@
import transformers
from accelerate.data_loader import DataLoaderStateMixin
from datasets import Dataset, load_dataset
from nncf import NNCFConfig, compress_weights
from nncf import NNCFConfig
from nncf.torch import create_compressed_model, register_default_init_args, register_module
from nncf.torch.dynamic_graph.io_handling import wrap_nncf_model_inputs_with_objwalk
from nncf.torch.initialization import PTInitializingDataLoader
from openvino._offline_transformations import compress_quantize_weights_transformation
from openvino.runtime import Core, Tensor
from torch.utils._pytree import tree_map
from torch.utils.data import DataLoader, RandomSampler
from transformers import DataCollator, PreTrainedModel, default_data_collator
from transformers.pytorch_utils import Conv1D

from optimum.exporters.onnx.convert import check_dummy_inputs_are_allowed
from optimum.exporters.tasks import TasksManager
from optimum.quantization_base import OptimumQuantizer

from ...exporters.openvino import export, export_pytorch_via_onnx
from ...exporters.openvino.stateful import ensure_export_task_support_stateful
from ...exporters.openvino.model_patcher import patch_model_with_bettertransformer
from ...exporters.openvino.stateful import ensure_export_task_support_stateful, ensure_stateful_is_available
from ..utils.constant import _TASK_ALIASES
from ..utils.modeling_utils import get_model_device
from .configuration import OVConfig
from .modeling_base import OVBaseModel
from .modeling_decoder import OVBaseDecoderModel
@@ -361,9 +365,7 @@ def _quantize_ovcausallm(
self.model.model,
quantization_dataset,
model_type=nncf.ModelType.TRANSFORMER if not kwargs.get("model_type") else kwargs.get("model_type"),
fast_bias_correction=True
if not kwargs.get("fast_bias_correction")
else kwargs.get("fast_bias_correction"),
fast_bias_correction=kwargs.get("fast_bias_correction", True),
**kwargs,
)
self.model.model = quantized_model
@@ -405,13 +407,42 @@ def _quantize_torchmodel(
if file_name is None and ov_config.save_onnx_model
else Path(ov_file_name).with_suffix(".onnx")
)

task = self.task
model = self.model
self.model.config.save_pretrained(save_directory)
if task.startswith("text-generation"):
onnx_config = onnx_config_class(
model.config, use_past=model.config.use_cache, use_past_in_inputs=model.config.use_cache
)
if model.config.use_cache:
task = "text-generation-with-past"
else:
onnx_config = onnx_config_class(model.config)

stateful = ensure_stateful_is_available() and ensure_export_task_support_stateful(task)

if weights_only:
if getattr(self.model.config, "tie_word_embeddings", True):
# to fix problem with shared embedding weights in nncf compress_weights()
self.model.tie_weights()
compressed_model = compress_weights(self.model)
self.model = compressed_model
if stateful:
# patch model before weight compression
model = patch_model_with_bettertransformer(model)

dummy_inputs = onnx_config.generate_dummy_inputs(framework="pt")
device = get_model_device(model)
dummy_inputs = tree_map(
lambda value: value.to(device) if isinstance(value, torch.Tensor) else value, dummy_inputs
)
check_dummy_inputs_are_allowed(model, dummy_inputs)

nncf.compress_weights(model, dataset=nncf.Dataset([dummy_inputs]))
else:
if stateful:
logger.warn(
"Quantization algorithm does not support optimized stateful models. "
"The original model without optimization will be quantized and export."
)
stateful = False

calibration_dataloader = self._get_calibration_dataloader(
calibration_dataset=calibration_dataset,
batch_size=batch_size,
@@ -423,22 +454,10 @@ def _quantize_torchmodel(
ov_config.add_input_info(model_inputs)
nncf_config = NNCFConfig.from_dict(ov_config.__dict__)
nncf_config = register_default_init_args(nncf_config, calibration_dataloader)
controller, compressed_model = create_compressed_model(
self.model, nncf_config, wrap_inputs_fn=wrap_nncf_model_inputs_with_objwalk
)
compressed_model = controller.strip(do_copy=False)

task = self.task
model = self.model
self.model.config.save_pretrained(save_directory)
if task.startswith("text-generation"):
onnx_config = onnx_config_class(
model.config, use_past=model.config.use_cache, use_past_in_inputs=model.config.use_cache
controller, model = create_compressed_model(
model, nncf_config, wrap_inputs_fn=wrap_nncf_model_inputs_with_objwalk
)
if model.config.use_cache:
task = "text-generation-with-past"
else:
onnx_config = onnx_config_class(model.config)
model = controller.strip(do_copy=False)

model_path = save_directory / (onnx_file_name if ov_config.save_onnx_model else ov_file_name)
onnx_path = save_directory / onnx_file_name
@@ -447,7 +466,8 @@ def _quantize_torchmodel(
opset = max(opset, MIN_ONNX_QDQ_OPSET)
kwargs = {}
if not ov_config.save_onnx_model:
kwargs = {"stateful": ensure_export_task_support_stateful(task)}
kwargs = {"stateful": stateful}

_, _, is_onnx = export_fn(model=model, config=onnx_config, output=model_path, opset=opset, **kwargs)
if is_onnx:
# Load and save the compressed model
21 changes: 21 additions & 0 deletions optimum/intel/utils/modeling_utils.py
Original file line number Diff line number Diff line change
@@ -148,3 +148,24 @@ def patch_decoder_attention_mask(model: "PreTrainedModel"):
elif model.config.model_type in {"blenderbot-small", "blenderbot", "opt", "pegasus", "bart"}:
model.model.decoder._prepare_decoder_attention_mask = _prepare_decoder_attention_mask
return model


def get_model_device(model: torch.nn.Module) -> torch.device:
"""
Determines the device on which a PyTorch model is currently residing.
Args:
model: The PyTorch model to query.
Returns:
torch.device: The device where the model's parameters are located.
Raises:
StopIteration: If the model has no parameters.
"""
try:
device = next(model.parameters()).device
except StopIteration:
# The model had no parameters at all, doesn't matter which device to choose
device = torch.device("cpu")
return device