Skip to content

Commit 8fe4f9d

Browse files
authored
Merge branch 'huggingface:main' into flash_xpu
2 parents 3616dd2 + 8361e45 commit 8fe4f9d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2628
-962
lines changed

.github/workflows/test_openvino.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ jobs:
5454
- if: ${{ matrix.transformers-version == 'latest' && matrix.test-pattern == '*modeling*'}}
5555
name: Install auto-gptq, autoawq
5656
run: |
57-
pip install auto-gptq autoawq --extra-index-url https://download.pytorch.org/whl/cpu
57+
pip install auto-gptq "autoawq<0.2.8" --extra-index-url https://download.pytorch.org/whl/cpu
5858
5959
- if: ${{ matrix.test-pattern == '*modeling*' }}
6060
name: Uninstall NNCF

.github/workflows/test_openvino_full.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ jobs:
8181
- if: ${{ matrix.transformers-version == 'latest' && matrix.os != 'windows-2019' }}
8282
name: Install auto-gptq, autoawq
8383
run: |
84-
pip install auto-gptq autoawq --extra-index-url https://download.pytorch.org/whl/cpu
84+
pip install auto-gptq "autoawq<0.2.8" --extra-index-url https://download.pytorch.org/whl/cpu
8585
8686
- name: Pip freeze
8787
run: pip freeze

.github/workflows/test_openvino_slow.yml

+6-5
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@ jobs:
2626
matrix:
2727
os: ["ubuntu-22.04", "windows-2019"]
2828
transformers-version: ["4.36.0", "latest"]
29+
include:
30+
- transformers-version: "4.40.0"
31+
os: "ubuntu-22.04"
32+
- transformers-version: "4.45.0"
33+
os: "ubuntu-22.04"
2934

3035
runs-on: ${{ matrix.os }}
3136

@@ -52,7 +57,7 @@ jobs:
5257
- if: ${{ matrix.transformers-version == 'latest' && matrix.os != 'windows-2019' }}
5358
name: Install auto-gptq, autoawq
5459
run: |
55-
pip install auto-gptq autoawq --extra-index-url https://download.pytorch.org/whl/cpu
60+
pip install auto-gptq "autoawq<0.2.8" --extra-index-url https://download.pytorch.org/whl/cpu
5661
5762
- name: Pip freeze
5863
run: pip freeze
@@ -65,10 +70,6 @@ jobs:
6570
run: |
6671
pip install .[nncf]
6772
68-
- if: ${{ matrix.transformers-version != 'latest' }}
69-
name: Downgrade Transformers and Accelerate
70-
run: pip install transformers==${{ matrix.transformers-version }} accelerate==0.*
71-
7273
- name: Test with Pytest (slow)
7374
run: |
7475
pytest tests/openvino -m "run_slow" --durations=0

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center">
2-
<img src="readme_logo.png" />
2+
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/logo/hf_intel_logo.png" />
33
</p>
44

55
# Optimum Intel

docs/source/openvino/export.mdx

+3-2
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,8 @@ Check out the help for more options:
3131

3232
```text
3333
usage: optimum-cli export openvino [-h] -m MODEL [--task TASK] [--framework {pt,tf}] [--trust-remote-code]
34-
[--weight-format {fp32,fp16,int8,int4,mxfp4,nf4}] [--quant-mode {int8,f8e4m3,f8e5m2}]
34+
[--weight-format {fp32,fp16,int8,int4,mxfp4,nf4}]
35+
[--quant-mode {int8,f8e4m3,f8e5m2,nf4_f8e4m3,nf4_f8e5m2,int4_f8e4m3,int4_f8e5m2}]
3536
[--library {transformers,diffusers,timm,sentence_transformers,open_clip}]
3637
[--cache_dir CACHE_DIR] [--pad-token-id PAD_TOKEN_ID] [--ratio RATIO] [--sym]
3738
[--group-size GROUP_SIZE] [--backup-precision {none,int8_sym,int8_asym}]
@@ -67,7 +68,7 @@ Optional arguments:
6768
on your local machine arbitrary code present in the model repository.
6869
--weight-format {fp32,fp16,int8,int4,mxfp4,nf4}
6970
The weight format of the exported model.
70-
--quant-mode {int8,f8e4m3,f8e5m2}
71+
--quant-mode {int8,f8e4m3,f8e5m2,nf4_f8e4m3,nf4_f8e5m2,int4_f8e4m3,int4_f8e5m2}
7172
Quantization precision mode. This is used for applying full model quantization including
7273
activations.
7374
--library {transformers,diffusers,timm,sentence_transformers,open_clip}

docs/source/openvino/models.mdx

+1
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ Here is the list of the supported architectures :
7272
- Llava
7373
- Llava-Next
7474
- M2-M100
75+
- MAIRA-2
7576
- MBart
7677
- MPNet
7778
- MPT

notebooks/ipex/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ You can find here a list of the notebooks for the IPEX integration in 🤗 Optim
66
| Notebook | Description | | |
77
|:----------|:-------------|:-------------|------:|
88
| [How to optimize your model with IPEX for text generation](https://github.com/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb)| Show how to apply operators and graph-level optimizations using Intel [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb)|
9-
9+
| [How to optimize your langchain pipeline with IPEX](https://github.com/huggingface/optimum-intel/blob/main/notebooks/ipex/langchain_hf_pipelines.ipynb)| Show how to optimize your langchain pipeline with IPEX [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/optimum-intel/blob/main/notebooks/ipex/langchain_hf_pipelines.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/optimum-intel/blob/main/notebooks/ipex/langchain_hf_pipelines.ipynb)|
+168
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Hugging Face Pipelines\n"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"If you're opening this Notebook on colab, you will probably need to install Langchain and 🤗 Optimum. Uncomment the following cell and run it."
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"#! pip install langchain-huggingface optimum[ipex]"
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {},
29+
"source": [
30+
"Make sure your version of langchain-huggingface is at least v0.2 and 🤗 Optimum is at least v1.22.0 since the functionality was introduced in these versions:"
31+
]
32+
},
33+
{
34+
"cell_type": "code",
35+
"execution_count": null,
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"from optimum.intel.version import __version__\n",
40+
"\n",
41+
"print(\"optimum-intel version is\", __version__)"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"from optimum.intel.utils.import_utils import _langchain_hf_version\n",
51+
"\n",
52+
"print(\"langchain-huggingface version is\", _langchain_hf_version)"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"## Model Loading\n",
60+
"\n",
61+
"Models can be loaded by specifying the model parameters using the `from_model_id` method."
62+
]
63+
},
64+
{
65+
"cell_type": "code",
66+
"execution_count": null,
67+
"metadata": {},
68+
"outputs": [],
69+
"source": [
70+
"from langchain_huggingface.llms import HuggingFacePipeline\n",
71+
"\n",
72+
"hf = HuggingFacePipeline.from_model_id(\n",
73+
" model_id=\"gpt2\",\n",
74+
" task=\"text-generation\",\n",
75+
" pipeline_kwargs={\"max_new_tokens\": 10},\n",
76+
" backend=\"ipex\",\n",
77+
")"
78+
]
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"metadata": {},
83+
"source": [
84+
"## Create Chain\n",
85+
"\n",
86+
"With the model loaded into memory, you can compose it with a prompt to form a chain."
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"metadata": {},
93+
"outputs": [],
94+
"source": [
95+
"from langchain_core.prompts import PromptTemplate\n",
96+
"\n",
97+
"template = \"\"\"Question: {question}\n",
98+
"\n",
99+
"Answer: Let's think step by step.\"\"\"\n",
100+
"prompt = PromptTemplate.from_template(template)\n",
101+
"\n",
102+
"chain = prompt | hf\n",
103+
"\n",
104+
"question = \"What is electroencephalography?\"\n",
105+
"\n",
106+
"print(chain.invoke({\"question\": question}))\n"
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"metadata": {},
112+
"source": [
113+
"To get response without prompt, you can bind skip_prompt=True with LLM."
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": null,
119+
"metadata": {},
120+
"outputs": [],
121+
"source": [
122+
"chain = prompt | hf.bind(skip_prompt=True)\n",
123+
"\n",
124+
"question = \"What is electroencephalography?\"\n",
125+
"\n",
126+
"print(chain.invoke({\"question\": question}))"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"Streaming response :"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"for chunk in chain.stream(question):\n",
143+
" print(chunk, end=\"\", flush=True)"
144+
]
145+
}
146+
],
147+
"metadata": {
148+
"kernelspec": {
149+
"display_name": "Python 3 (ipykernel)",
150+
"language": "python",
151+
"name": "python3"
152+
},
153+
"language_info": {
154+
"codemirror_mode": {
155+
"name": "ipython",
156+
"version": 3
157+
},
158+
"file_extension": ".py",
159+
"mimetype": "text/x-python",
160+
"name": "python",
161+
"nbconvert_exporter": "python",
162+
"pygments_lexer": "ipython3",
163+
"version": "3.10.14"
164+
}
165+
},
166+
"nbformat": 4,
167+
"nbformat_minor": 4
168+
}

notebooks/ipex/text_generation.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"source": [
2323
"import torch\n",
2424
"from transformers import AutoTokenizer\n",
25-
"from optimum.intel.ipex import IPEXModelForCausalLM"
25+
"from optimum.intel import IPEXModelForCausalLM"
2626
]
2727
},
2828
{

0 commit comments

Comments
 (0)