You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For more details, please refer to the [documentation](https://intel.github.io/intel-extension-for-pytorch/#introduction).
71
47
72
48
## Neural Compressor
73
49
@@ -227,6 +203,33 @@ Quantization aware training (QAT) is applied in order to simulate the effects of
227
203
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/index).
228
204
229
205
206
+
## IPEX
207
+
With `export=True`, IPEX model will replace torch linear to ipex linear which prepacks the weights. It will also apply linear fusioin and [IAKV](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html#indirect-access-kv-cache) for generation. Finally, jit.trace will be applied to change the model to graph mode.
208
+
Here is the example of how to use IPEX optimized model to generate texts.
209
+
### generate
210
+
```diff
211
+
import torch
212
+
from transformers import AutoTokenizer
213
+
- from transformers import AutoModelForCausalLM
214
+
+ from optimum.intel.ipex import IPEXModelForCausalLM
215
+
216
+
217
+
model_id = "gpt2"
218
+
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
219
+
+ model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, export=True)
220
+
tokenizer = AutoTokenizer.from_pretrained("gpt2")
221
+
input_sentence = ["Answer the following yes/no question by reasoning step-by-step please. Can you write a whole Haiku in a single tweet?"]
For more details, please refer to the [documentation](https://intel.github.io/intel-extension-for-pytorch/#introduction).
231
+
232
+
230
233
## Running the examples
231
234
232
235
Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
0 commit comments