Skip to content

Commit 1a59c1b

Browse files
committed
fix ipex readme
1 parent 0d312f1 commit 1a59c1b

File tree

1 file changed

+27
-24
lines changed

1 file changed

+27
-24
lines changed

README.md

+27-24
Original file line numberDiff line numberDiff line change
@@ -44,30 +44,6 @@ where `extras` can be one or more of `ipex`, `neural-compressor`, `openvino`, `n
4444

4545
# Quick tour
4646

47-
## IPEX
48-
Here is the example of how to use IPEX optimized model to generate texts.
49-
### generate
50-
```diff
51-
import torch
52-
from transformers import AutoTokenizer
53-
- from transformers import AutoModelForCausalLM
54-
+ from optimum.intel.ipex import IPEXModelForCausalLM
55-
56-
57-
model_id = "gpt2"
58-
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
59-
+ model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, export=True)
60-
tokenizer = AutoTokenizer.from_pretrained("gpt2")
61-
input_sentence = ["Answer the following yes/no question by reasoning step-by-step please. Can you write a whole Haiku in a single tweet?"]
62-
model_inputs = tokenizer(input_sentence, return_tensors="pt")
63-
generation_kwargs = dict(max_new_tokens=32, do_sample=False, num_beams=4, num_beam_groups=1, no_repeat_ngram_size=2, use_cache=True)
64-
65-
generated_ids = model.generate(**model_inputs, **generation_kwargs)
66-
output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
67-
print(output)
68-
```
69-
70-
For more details, please refer to the [documentation](https://intel.github.io/intel-extension-for-pytorch/#introduction).
7147

7248
## Neural Compressor
7349

@@ -227,6 +203,33 @@ Quantization aware training (QAT) is applied in order to simulate the effects of
227203
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/index).
228204

229205

206+
## IPEX
207+
With `export=True`, IPEX model will replace torch linear to ipex linear which prepacks the weights. It will also apply linear fusioin and [IAKV](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html#indirect-access-kv-cache) for generation. Finally, jit.trace will be applied to change the model to graph mode.
208+
Here is the example of how to use IPEX optimized model to generate texts.
209+
### generate
210+
```diff
211+
import torch
212+
from transformers import AutoTokenizer
213+
- from transformers import AutoModelForCausalLM
214+
+ from optimum.intel.ipex import IPEXModelForCausalLM
215+
216+
217+
model_id = "gpt2"
218+
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
219+
+ model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, export=True)
220+
tokenizer = AutoTokenizer.from_pretrained("gpt2")
221+
input_sentence = ["Answer the following yes/no question by reasoning step-by-step please. Can you write a whole Haiku in a single tweet?"]
222+
model_inputs = tokenizer(input_sentence, return_tensors="pt")
223+
generation_kwargs = dict(max_new_tokens=32, do_sample=False, num_beams=4, num_beam_groups=1, no_repeat_ngram_size=2, use_cache=True)
224+
225+
generated_ids = model.generate(**model_inputs, **generation_kwargs)
226+
output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
227+
print(output)
228+
```
229+
230+
For more details, please refer to the [documentation](https://intel.github.io/intel-extension-for-pytorch/#introduction).
231+
232+
230233
## Running the examples
231234

232235
Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.

0 commit comments

Comments
 (0)