Skip to content

Commit 005cceb

Browse files
authored
Merge branch 'main' into ea/janus
2 parents cbe7460 + cd44f82 commit 005cceb

11 files changed

+220
-26
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center">
2-
<img src="readme_logo.png" />
2+
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/logo/hf_intel_logo.png" />
33
</p>
44

55
# Optimum Intel

notebooks/ipex/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ You can find here a list of the notebooks for the IPEX integration in 🤗 Optim
66
| Notebook | Description | | |
77
|:----------|:-------------|:-------------|------:|
88
| [How to optimize your model with IPEX for text generation](https://github.com/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb)| Show how to apply operators and graph-level optimizations using Intel [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb)|
9-
9+
| [How to optimize your langchain pipeline with IPEX](https://github.com/huggingface/optimum-intel/blob/main/notebooks/ipex/langchain_hf_pipelines.ipynb)| Show how to optimize your langchain pipeline with IPEX [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/optimum-intel/blob/main/notebooks/ipex/langchain_hf_pipelines.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/optimum-intel/blob/main/notebooks/ipex/langchain_hf_pipelines.ipynb)|
+168
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Hugging Face Pipelines\n"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"If you're opening this Notebook on colab, you will probably need to install Langchain and 🤗 Optimum. Uncomment the following cell and run it."
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"#! pip install langchain-huggingface optimum[ipex]"
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {},
29+
"source": [
30+
"Make sure your version of langchain-huggingface is at least v0.2 and 🤗 Optimum is at least v1.22.0 since the functionality was introduced in these versions:"
31+
]
32+
},
33+
{
34+
"cell_type": "code",
35+
"execution_count": null,
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"from optimum.intel.version import __version__\n",
40+
"\n",
41+
"print(\"optimum-intel version is\", __version__)"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"from optimum.intel.utils.import_utils import _langchain_hf_version\n",
51+
"\n",
52+
"print(\"langchain-huggingface version is\", _langchain_hf_version)"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"## Model Loading\n",
60+
"\n",
61+
"Models can be loaded by specifying the model parameters using the `from_model_id` method."
62+
]
63+
},
64+
{
65+
"cell_type": "code",
66+
"execution_count": null,
67+
"metadata": {},
68+
"outputs": [],
69+
"source": [
70+
"from langchain_huggingface.llms import HuggingFacePipeline\n",
71+
"\n",
72+
"hf = HuggingFacePipeline.from_model_id(\n",
73+
" model_id=\"gpt2\",\n",
74+
" task=\"text-generation\",\n",
75+
" pipeline_kwargs={\"max_new_tokens\": 10},\n",
76+
" backend=\"ipex\",\n",
77+
")"
78+
]
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"metadata": {},
83+
"source": [
84+
"## Create Chain\n",
85+
"\n",
86+
"With the model loaded into memory, you can compose it with a prompt to form a chain."
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"metadata": {},
93+
"outputs": [],
94+
"source": [
95+
"from langchain_core.prompts import PromptTemplate\n",
96+
"\n",
97+
"template = \"\"\"Question: {question}\n",
98+
"\n",
99+
"Answer: Let's think step by step.\"\"\"\n",
100+
"prompt = PromptTemplate.from_template(template)\n",
101+
"\n",
102+
"chain = prompt | hf\n",
103+
"\n",
104+
"question = \"What is electroencephalography?\"\n",
105+
"\n",
106+
"print(chain.invoke({\"question\": question}))\n"
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"metadata": {},
112+
"source": [
113+
"To get response without prompt, you can bind skip_prompt=True with LLM."
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": null,
119+
"metadata": {},
120+
"outputs": [],
121+
"source": [
122+
"chain = prompt | hf.bind(skip_prompt=True)\n",
123+
"\n",
124+
"question = \"What is electroencephalography?\"\n",
125+
"\n",
126+
"print(chain.invoke({\"question\": question}))"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"Streaming response :"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"for chunk in chain.stream(question):\n",
143+
" print(chunk, end=\"\", flush=True)"
144+
]
145+
}
146+
],
147+
"metadata": {
148+
"kernelspec": {
149+
"display_name": "Python 3 (ipykernel)",
150+
"language": "python",
151+
"name": "python3"
152+
},
153+
"language_info": {
154+
"codemirror_mode": {
155+
"name": "ipython",
156+
"version": 3
157+
},
158+
"file_extension": ".py",
159+
"mimetype": "text/x-python",
160+
"name": "python",
161+
"nbconvert_exporter": "python",
162+
"pygments_lexer": "ipython3",
163+
"version": "3.10.14"
164+
}
165+
},
166+
"nbformat": 4,
167+
"nbformat_minor": 4
168+
}

notebooks/ipex/text_generation.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"source": [
2323
"import torch\n",
2424
"from transformers import AutoTokenizer\n",
25-
"from optimum.intel.ipex import IPEXModelForCausalLM"
25+
"from optimum.intel import IPEXModelForCausalLM"
2626
]
2727
},
2828
{

optimum/exporters/openvino/model_patcher.py

+18-14
Original file line numberDiff line numberDiff line change
@@ -718,14 +718,15 @@ def _mistral_update_causal_mask(
718718
class MistralModelPatcher(DecoderModelPatcher):
719719
def __enter__(self):
720720
super().__enter__()
721-
if is_transformers_version(">=", "4.42.0"):
721+
if is_transformers_version(">=", "4.42.0") and is_transformers_version("<", "4.48.0"):
722722
# apply fix https://github.com/huggingface/transformers/commit/57d7594a79a9f5d835abf2d4d384db0e4818e548
723723
self._model.model._orig_update_causal_mask = self._model.model._update_causal_mask
724724
self._model.model._update_causal_mask = types.MethodType(_mistral_update_causal_mask, self._model.model)
725725

726726
else:
727727
for layer in self._model.model.layers:
728-
_reinitialize_cos_sin_cached_fp32(layer.self_attn.rotary_emb)
728+
if hasattr(layer.self_attn, "rotary_emb"):
729+
_reinitialize_cos_sin_cached_fp32(layer.self_attn.rotary_emb)
729730

730731
def __exit__(self, exc_type, exc_value, traceback):
731732
super().__exit__(exc_type, exc_value, traceback)
@@ -734,7 +735,7 @@ def __exit__(self, exc_type, exc_value, traceback):
734735
self._model.model._update_causal_mask = self._model.model._orig_update_causal_mask
735736

736737
for layer in self._model.model.layers:
737-
if hasattr(layer.self_attn.rotary_emb, "_orig_forward"):
738+
if hasattr(layer.self_attn, "rotary_emb") and hasattr(layer.self_attn.rotary_emb, "_orig_forward"):
738739
layer.self_attn.rotary_emb.forward = layer.self_attn.rotary_emb._orig_forward
739740

740741

@@ -1580,19 +1581,19 @@ def __enter__(self):
15801581
):
15811582
self._model.config.max_position_embeddings = self._model.config.original_max_position_embeddings
15821583

1583-
if is_transformers_version(">=", "4.42.0"):
1584+
if is_transformers_version(">=", "4.42.0") and is_transformers_version("<", "4.48.0"):
15841585
self._model.model._orig_forward = self._model.model.forward
15851586
self._model.model.forward = types.MethodType(phi3_442_forward, self._model.model)
15861587

15871588
# https://github.com/huggingface/transformers/blob/30ee508c6c92a1c0aa0281d193c7c0fb815b8d2f/src/transformers/models/phi3/modeling_phi3.py#L113
15881589
# init inv_freq for torchscript tracing
15891590
for layer in self._model.model.layers:
1590-
if is_torch_version(">=", "2.1.0"):
1591+
if is_torch_version(">=", "2.1.0") and is_transformers_version("<", "4.48.0"):
15911592
orig_self_attn_fwd = layer.self_attn.forward
15921593
layer.self_attn.forward = types.MethodType(_phi3_self_attn_sdpa_forward, layer.self_attn)
15931594
layer.self_attn._orig_forward = orig_self_attn_fwd
15941595

1595-
if layer.self_attn.rotary_emb.inv_freq is None:
1596+
if hasattr(layer.self_attn, "rotary_emb") and layer.self_attn.rotary_emb.inv_freq is None:
15961597
rotary_emb = layer.self_attn.rotary_emb
15971598
layer.self_attn.rotary_emb.inv_freq = 1.0 / (
15981599
rotary_emb.base ** (torch.arange(0, rotary_emb.dim, 2, dtype=torch.int64).float() / rotary_emb.dim)
@@ -2493,7 +2494,9 @@ class UpdateCausalMaskModelPatcher(DecoderModelPatcher):
24932494
def __enter__(self):
24942495
super().__enter__()
24952496
patch_update_causal_mask(self._model, "4.42.0")
2496-
if hasattr(self._model.model.layers[0].self_attn.rotary_emb, "_set_cos_sin_cache"):
2497+
if hasattr(self._model.model.layers[0].self_attn, "rotary_emb") and hasattr(
2498+
self._model.model.layers[0].self_attn.rotary_emb, "_set_cos_sin_cache"
2499+
):
24972500
for layer in self._model.model.layers:
24982501
_reinitialize_cos_sin_cached_fp32(layer.self_attn.rotary_emb)
24992502

@@ -3045,15 +3048,16 @@ def patched_forward(self, fn):
30453048
def __enter__(self):
30463049
if is_torch_version(">=", "2.1.0"):
30473050
if self._model.config.model_type == "qwen2" and self._model.config._attn_implementation != "sdpa":
3048-
from transformers.models.qwen2.modeling_qwen2 import QWEN2_ATTENTION_CLASSES
3051+
if is_transformers_version("<", "4.48"):
3052+
from transformers.models.qwen2.modeling_qwen2 import QWEN2_ATTENTION_CLASSES
30493053

3050-
sdpa_attn = QWEN2_ATTENTION_CLASSES["sdpa"]
3051-
self._model.config._orig_attn_implementation = self._model.config._attn_implementation
3052-
self._model.config._attn_implementation = "sdpa"
3054+
sdpa_attn = QWEN2_ATTENTION_CLASSES["sdpa"]
3055+
self._model.config._orig_attn_implementation = self._model.config._attn_implementation
3056+
self._model.config._attn_implementation = "sdpa"
30533057

3054-
for layer in self._model.model.layers:
3055-
layer.self_attn._orig_forward = layer.self_attn.forward
3056-
layer.self_attn.forward = types.MethodType(sdpa_attn.forward, layer.self_attn)
3058+
for layer in self._model.model.layers:
3059+
layer.self_attn._orig_forward = layer.self_attn.forward
3060+
layer.self_attn.forward = types.MethodType(sdpa_attn.forward, layer.self_attn)
30573061

30583062
if self._model.config.model_type == "llama" and self._model.config._attn_implementation != "sdpa":
30593063
self._model.config._orig_attn_implementation = self._model.config._attn_implementation

optimum/intel/openvino/modeling_decoder.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,11 @@
5656

5757

5858
if TYPE_CHECKING:
59-
from transformers.generation.streamers import BaseStreamer
59+
try:
60+
from transformers.generation.streamers import BaseStreamer
61+
except Exception:
62+
from typing import Generator as BaseStreamer
63+
6064
from transformers.modeling_utils import PreTrainedModel
6165

6266

optimum/intel/openvino/modeling_seq2seq.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -668,7 +668,7 @@ def forward(
668668
logits = torch.from_numpy(self.request.get_tensor("logits").data).to(self.device)
669669
self._past_length += input_ids.shape[1]
670670

671-
out_past_key_values = ()
671+
out_past_key_values = ((),)
672672

673673
if not self.stateful:
674674
# Tuple of length equal to : number of layer * number of past_key_value per decoder layer (2 corresponds to the

optimum/intel/openvino/modeling_visual_language.py

+13
Original file line numberDiff line numberDiff line change
@@ -1090,6 +1090,18 @@ def preprocess_inputs(
10901090
prompt = "<image>\n" + text
10911091
else:
10921092
prompt = text
1093+
1094+
if getattr(processor, "patch_size", None) is None:
1095+
if (
1096+
getattr(config, "vision_config", None) is not None
1097+
and getattr(config.vision_config, "patch_size", None) is not None
1098+
):
1099+
processor.patch_size = config.vision_config.patch_size
1100+
else:
1101+
raise ValueError(
1102+
"Processor does not have `patch_size` attribute. Please fix the processor or provide `patch_size` in the config."
1103+
)
1104+
10931105
inputs = processor(images=image, text=prompt, return_tensors="pt")
10941106
return inputs
10951107

@@ -1985,6 +1997,7 @@ def preprocess_inputs(
19851997
input_ids = tokenizer(text, return_tensors="pt").input_ids
19861998
attention_mask = torch.ones_like(input_ids, dtype=torch.int64)
19871999
result = {"input_ids": input_ids, "attention_mask": attention_mask}
2000+
19882001
if image is not None:
19892002
result["images"] = processor(images=[image], return_tensors="pt")["pixel_values"]
19902003
return result

readme_logo.png

-33.5 KB
Binary file not shown.

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
INSTALL_REQUIRE = [
3030
"torch>=1.11",
3131
"optimum @ git+https://github.com/eaidova/optimum.git@ea/avoid_lib_guessing_in_standartize_args",
32-
"transformers>=4.36,<4.48",
32+
"transformers>=4.36,<4.49",
3333
"datasets>=1.4.0",
3434
"sentencepiece",
3535
"setuptools",

0 commit comments

Comments
 (0)