Skip to content

Commit 403c696

Browse files
Add IPEX documentation (huggingface#828)
* change readme, source/index, source/installation * add ipex doc 1st step * update readme for command line usage * fix bug for ipex readme * add export doc * update all ipex docs * rm diffusers * change register * Update README.md Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update docs/source/installation.mdx Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * fix readme * fix ipex exporter args comments * extend ipex export explain * fix ipex reference.mdx * add comments for auto doc * rm cli export * Update optimum/commands/export/ipex.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * rm commit hash in export command * rm export * rm jit * add ipex on doc's docker file * indicate that ipex model only supports for cpu and the export format will be changed to compile in the future * Update docs/source/ipex/inference.mdx Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * explain patching * rm ipex reference * Update docs/source/ipex/inference.mdx * Update docs/source/ipex/inference.mdx * Update docs/source/ipex/inference.mdx * Update docs/source/index.mdx * Update docs/source/ipex/inference.mdx * Update docs/source/ipex/models.mdx * Update docs/Dockerfile --------- Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
1 parent 1f3d0c2 commit 403c696

File tree

9 files changed

+162
-4
lines changed

9 files changed

+162
-4
lines changed

README.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -223,15 +223,14 @@ To load your IPEX model, you can just replace your `AutoModelForXxx` class with
223223
tokenizer = AutoTokenizer.from_pretrained(model_id)
224224
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
225225
results = pipe("He's a dreadful magician and")
226-
227226
```
228227

229228
For more details, please refer to the [documentation](https://intel.github.io/intel-extension-for-pytorch/#introduction).
230229

231230

232231
## Running the examples
233232

234-
Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
233+
Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) and [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
235234

236235
Do not forget to install requirements for every example:
237236

docs/source/_toctree.yml

+11
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,16 @@
3030
title: Tutorials
3131
isExpanded: false
3232
title: OpenVINO
33+
- sections:
34+
- local: ipex/inference
35+
title: Inference
36+
- local: ipex/models
37+
title: Supported Models
38+
- sections:
39+
- local: ipex/tutorials/notebooks
40+
title: Notebooks
41+
title: Tutorials
42+
isExpanded: false
43+
title: IPEX
3344
title: Optimum Intel
3445
isExpanded: false

docs/source/index.mdx

+2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ limitations under the License.
1919

2020
🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
2121

22+
[Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/#introduction) (IPEX) is an open-source library which provides optimizations for both eager mode and graph mode, however, compared to eager mode, graph mode in PyTorch* normally yields better performance from optimization techniques, such as operation fusion.
23+
2224
[Intel Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) is an open-source library enabling the usage of the most popular compression techniques such as quantization, pruning and knowledge distillation. It supports automatic accuracy-driven tuning strategies in order for users to easily generate quantized model. The users can easily apply static, dynamic and aware-training quantization approaches while giving an expected accuracy criteria. It also supports different weight pruning techniques enabling the creation of pruned model giving a predefined sparsity target.
2325

2426
[OpenVINO](https://docs.openvino.ai) is an open-source toolkit that enables high performance inference capabilities for Intel CPUs, GPUs, and special DL inference accelerators ([see](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) the full list of supported devices). It is supplied with a set of tools to optimize your models with compression techniques such as quantization, pruning and knowledge distillation. Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime.

docs/source/installation.mdx

+2-1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ To install the latest release of 🤗 Optimum Intel with the corresponding requi
2222
|:-----------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------|
2323
| [Intel Neural Compressor (INC)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) | `pip install --upgrade --upgrade-strategy eager "optimum[neural-compressor]"`|
2424
| [Intel OpenVINO](https://docs.openvino.ai ) | `pip install --upgrade --upgrade-strategy eager "optimum[openvino]"` |
25+
| [Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/#introduction) | `pip install --upgrade --upgrade-strategy eager "optimum[ipex]"` |
2526

2627
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.
2728

@@ -42,4 +43,4 @@ or to install from source including dependencies:
4243
python -m pip install "optimum-intel[extras]"@git+https://github.com/huggingface/optimum-intel.git
4344
```
4445

45-
where `extras` can be one or more of `neural-compressor`, `openvino`, `nncf`.
46+
where `extras` can be one or more of `neural-compressor`, `openvino`, `ipex`.

docs/source/ipex/inference.mdx

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
3+
the License. You may obtain a copy of the License at
4+
http://www.apache.org/licenses/LICENSE-2.0
5+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
6+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
8+
-->
9+
10+
# Inference
11+
12+
Optimum Intel can be used to load models from the [Hub](https://huggingface.co/models) and create pipelines to run inference with IPEX optimizations (including patching with custom operators, weight prepacking and graph mode) on a variety of Intel processors. For now support is only enabled for CPUs.
13+
14+
15+
## Loading
16+
17+
You can load your model and apply IPEX optimizations (including weight prepacking and graph mode). For supported architectures like LLaMA, BERT and ViT, further optimizations will be applied by patching the model to use custom operators.
18+
For now, support is only enabled for CPUs and the original model will be exported via TorchScript. In the future `torch.compile` will be used and model exported via TorchScript will get deprecated.
19+
20+
```diff
21+
import torch
22+
from transformers import AutoTokenizer, pipeline
23+
- from transformers import AutoModelForCausalLM
24+
+ from optimum.intel import IPEXModelForCausalLM
25+
26+
model_id = "gpt2"
27+
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
28+
+ model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, export=True)
29+
tokenizer = AutoTokenizer.from_pretrained(model_id)
30+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
31+
results = pipe("He's a dreadful magician and")
32+
```
33+
34+
As shown in the table below, each task is associated with a class enabling to automatically load your model.
35+
36+
| Auto Class | Task |
37+
|--------------------------------------|--------------------------------------|
38+
| `IPEXModelForSequenceClassification` | `text-classification` |
39+
| `IPEXModelForTokenClassification` | `token-classification` |
40+
| `IPEXModelForQuestionAnswering` | `question-answering` |
41+
| `IPEXModelForImageClassification` | `image-classification` |
42+
| `IPEXModel` | `feature-extraction` |
43+
| `IPEXModelForMaskedLM` | `fill-mask` |
44+
| `IPEXModelForAudioClassification` | `audio-classification` |
45+
| `IPEXModelForCausalLM` | `text-generation` |

docs/source/ipex/models.mdx

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
3+
the License. You may obtain a copy of the License at
4+
http://www.apache.org/licenses/LICENSE-2.0
5+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
6+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
8+
-->
9+
10+
# Supported models
11+
12+
🤗 Optimum provides IPEX optimizations for both eager mode and graph mode. It provides classes and functions to perform this step easily.
13+
Here is the list of the supported architectures :
14+
15+
## [Transformers](https://huggingface.co/docs/transformers/index)
16+
17+
- Albert
18+
- Bart
19+
- Beit
20+
- Bert
21+
- BlenderBot
22+
- BlenderBotSmall
23+
- Bloom
24+
- CodeGen
25+
- DistilBert
26+
- Electra
27+
- Flaubert
28+
- GPT-2
29+
- GPT-BigCode
30+
- GPT-Neo
31+
- GPT-NeoX
32+
- Llama
33+
- MPT
34+
- Mistral
35+
- MobileNet v1
36+
- MobileNet v2
37+
- MobileVit
38+
- OPT
39+
- ResNet
40+
- Roberta
41+
- Roformer
42+
- SqueezeBert
43+
- UniSpeech
44+
- Vit
45+
- Wav2Vec2
46+
- XLM
+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
3+
the License. You may obtain a copy of the License at
4+
http://www.apache.org/licenses/LICENSE-2.0
5+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
6+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
8+
-->
9+
10+
# Notebooks
11+
12+
## Inference
13+
14+
| Notebook | Description | | |
15+
|:---------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|------:|
16+
| [How to run inference with the IPEX](https://github.com/huggingface/optimum-intel/tree/main/notebooks/ipex) | Explains how to export your model to IPEX and to run inference with IPEX model on text-generation task | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb) | [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb) |

optimum/intel/ipex/modeling_base.py

+35-1
Original file line numberDiff line numberDiff line change
@@ -198,14 +198,48 @@ def _from_pretrained(
198198
token: Optional[Union[bool, str]] = None,
199199
revision: Optional[str] = None,
200200
force_download: bool = False,
201-
cache_dir: str = HUGGINGFACE_HUB_CACHE,
201+
cache_dir: Union[str, Path] = HUGGINGFACE_HUB_CACHE,
202202
subfolder: str = "",
203203
local_files_only: bool = False,
204204
torch_dtype: Optional[Union[str, "torch.dtype"]] = None,
205205
trust_remote_code: bool = False,
206206
file_name: Optional[str] = WEIGHTS_NAME,
207207
**kwargs,
208208
):
209+
"""
210+
Loads a model and its configuration file from a directory or the HF Hub.
211+
212+
Arguments:
213+
model_id (`str` or `Path`):
214+
The directory from which to load the model.
215+
Can be either:
216+
- The model id of a pretrained model hosted inside a model repo on huggingface.co.
217+
- The path to a directory containing the model weights.
218+
use_auth_token (Optional[Union[bool, str]], defaults to `None`):
219+
Deprecated. Please use `token` instead.
220+
token (Optional[Union[bool, str]], defaults to `None`):
221+
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
222+
when running `huggingface-cli login` (stored in `~/.huggingface`).
223+
revision (`str`, *optional*):
224+
The specific model version to use. It can be a branch name, a tag name, or a commit id.
225+
force_download (`bool`, defaults to `False`):
226+
Whether or not to force the (re-)download of the model weights and configuration files, overriding the
227+
cached versions if they exist.
228+
cache_dir (`Union[str, Path]`, *optional*):
229+
The path to a directory in which a downloaded pretrained model configuration should be cached if the
230+
standard cache should not be used.
231+
subfolder (`str`, *optional*)
232+
In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here.
233+
local_files_only (`bool`, *optional*, defaults to `False`):
234+
Whether or not to only look at local files (i.e., do not try to download the model).
235+
torch_dtype (`Optional[Union[str, "torch.dtype"]]`, *optional*)
236+
float16 or bfloat16 or float32: load in a specified dtype, ignoring the model config.torch_dtype if one exists. If not specified, the model will get loaded in float32.
237+
trust_remote_code (`bool`, *optional*)
238+
Allows to use custom code for the modeling hosted in the model repository. This option should only be set for repositories you trust and in which you have read the code, as it will execute on your local machine arbitrary code present in the model repository.
239+
file_name (`str`, *optional*):
240+
The file name of the model to load. Overwrites the default file name and allows one to load the model
241+
with a different name.
242+
"""
209243
if use_auth_token is not None:
210244
warnings.warn(
211245
"The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.",

optimum/intel/ipex/utils.py

+4
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,12 @@
1414

1515

1616
_HEAD_TO_AUTOMODELS = {
17+
"feature-extraction": "IPEXModel",
1718
"text-generation": "IPEXModelForCausalLM",
1819
"text-classification": "IPEXModelForSequenceClassification",
1920
"token-classification": "IPEXModelForTokenClassification",
2021
"question-answering": "IPEXModelForQuestionAnswering",
22+
"fill-mask": "IPEXModelForMaskedLM",
23+
"image-classification": "IPEXModelForImageClassification",
24+
"audio-classification": "IPEXModelForAudioClassification",
2125
}

0 commit comments

Comments
 (0)