You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+20
Original file line number
Diff line number
Diff line change
@@ -202,6 +202,26 @@ Quantization aware training (QAT) is applied in order to simulate the effects of
202
202
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/index).
203
203
204
204
205
+
## IPEX
206
+
To load your IPEX model, you can just replace your `AutoModelForXxx` class with the corresponding `IPEXModelForXxx` class. You can set `export=True` to load a PyTorch checkpoint, export your model via TorchScript and apply IPEX optimizations : both operators optimization (replaced with customized IPEX operators) and graph-level optimization (like operators fusion) will be applied on your model.
207
+
```diff
208
+
from transformers import AutoTokenizer, pipeline
209
+
- from transformers import AutoModelForCausalLM
210
+
+ from optimum.intel import IPEXModelForCausalLM
211
+
212
+
213
+
model_id = "gpt2"
214
+
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
215
+
+ model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, export=True)
For more details, please refer to the [documentation](https://intel.github.io/intel-extension-for-pytorch/#introduction).
223
+
224
+
205
225
## Running the examples
206
226
207
227
Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
0 commit comments