Skip to content

Commit 22b098d

Browse files
committed
Merge remote-tracking branch 'upstream/main' into feature/directml
2 parents 471f803 + 99bc877 commit 22b098d

12 files changed

+244
-301
lines changed

.github/workflows/dev_test_benckmark.yml

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ jobs:
2222
python-version: ${{ matrix.python-version }}
2323
- name: Install dependencies
2424
run: |
25+
pip install -U setuptools
2526
pip install wheel
2627
pip install .[tests,onnxruntime,benchmark] datasets
2728
pip install -U git+https://github.com/huggingface/evaluate

.github/workflows/test_benckmark.yml

+1
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ jobs:
2929
python-version: ${{ matrix.python-version }}
3030
- name: Install dependencies
3131
run: |
32+
pip install -U setuptools
3233
pip install wheel
3334
pip install .[tests,onnxruntime,benchmark] datasets
3435
- name: Test with unittest

README.md

+18-202
Original file line numberDiff line numberDiff line change
@@ -52,76 +52,6 @@ python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/op
5252

5353
The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.
5454

55-
### Features summary
56-
57-
| Features | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|
58-
|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|
59-
| Graph optimization | :heavy_check_mark: | N/A | :heavy_check_mark: | N/A |
60-
| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A | :heavy_check_mark: |
61-
| Post-training static quantization | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
62-
| Quantization Aware Training (QAT) | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
63-
| FP16 (half precision) | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: |
64-
| Pruning | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
65-
| Knowledge Distillation | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
66-
67-
68-
### OpenVINO
69-
70-
Before you begin, make sure you have all the necessary libraries installed :
71-
72-
```bash
73-
pip install --upgrade --upgrade-strategy eager optimum[openvino]
74-
```
75-
76-
It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO format easily:
77-
78-
```bash
79-
optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
80-
```
81-
82-
If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).
83-
84-
To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.
85-
86-
```diff
87-
- from transformers import AutoModelForSequenceClassification
88-
+ from optimum.intel import OVModelForSequenceClassification
89-
from transformers import AutoTokenizer, pipeline
90-
91-
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
92-
tokenizer = AutoTokenizer.from_pretrained(model_id)
93-
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
94-
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
95-
96-
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
97-
results = classifier("He's a dreadful magician.")
98-
```
99-
100-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).
101-
102-
### Neural Compressor
103-
104-
Before you begin, make sure you have all the necessary libraries installed :
105-
106-
```bash
107-
pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
108-
```
109-
110-
Dynamic quantization can be applied on your model:
111-
112-
```bash
113-
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
114-
```
115-
116-
To load a model quantized with Intel Neural Compressor, hosted locally or on the 🤗 hub, you can do as follows :
117-
```python
118-
from optimum.intel import INCModelForSequenceClassification
119-
120-
model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
121-
model = INCModelForSequenceClassification.from_pretrained(model_id)
122-
```
123-
124-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).
12555

12656
### ONNX + ONNX Runtime
12757

@@ -131,43 +61,11 @@ Before you begin, make sure you have all the necessary libraries installed :
13161
pip install optimum[exporters,onnxruntime]
13262
```
13363

134-
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:
135-
136-
```plain
137-
optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx
138-
```
139-
140-
The model can then be quantized using `onnxruntime`:
141-
142-
```bash
143-
optimum-cli onnxruntime quantize \
144-
--avx512 \
145-
--onnx_model roberta_base_qa_onnx \
146-
-o quantized_roberta_base_qa_onnx
147-
```
148-
149-
These commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).
64+
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
15065

15166
For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).
15267

153-
#### Run the exported model using ONNX Runtime
154-
155-
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:
156-
157-
```diff
158-
- from transformers import AutoModelForQuestionAnswering
159-
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
160-
from transformers import AutoTokenizer, pipeline
161-
162-
model_id = "deepset/roberta-base-squad2"
163-
tokenizer = AutoTokenizer.from_pretrained(model_id)
164-
- model = AutoModelForQuestionAnswering.from_pretrained(model_id)
165-
+ model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
166-
qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
167-
question = "What's Optimum?"
168-
context = "Optimum is an awesome library everyone should use!"
169-
results = qa_pipe(question=question, context=context)
170-
```
68+
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend.
17169

17270
More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
17371

@@ -179,14 +77,21 @@ Before you begin, make sure you have all the necessary libraries installed :
17977
pip install optimum[exporters-tf]
18078
```
18179

182-
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:
80+
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them.
81+
You can find more information in our [documentation](https://huggingface.co/docs/optimum/main/exporters/tflite/usage_guides/export_a_model).
18382

184-
```plain
185-
optimum-cli export tflite \
186-
-m deepset/roberta-base-squad2 \
187-
--sequence_length 384 \
188-
--quantize int8-dynamic roberta_tflite_model
189-
```
83+
### Intel (OpenVINO + Neural Compressor + IPEX)
84+
85+
Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
86+
87+
You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
88+
89+
90+
### Quanto
91+
92+
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the `optimum-cli`.
93+
94+
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
19095

19196
## Accelerated training
19297

@@ -205,37 +110,7 @@ Before you begin, make sure you have all the necessary libraries installed :
205110
pip install --upgrade --upgrade-strategy eager optimum[habana]
206111
```
207112

208-
```diff
209-
- from transformers import Trainer, TrainingArguments
210-
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments
211-
212-
# Download a pretrained model from the Hub
213-
model = AutoModelForXxx.from_pretrained("bert-base-uncased")
214-
215-
# Define the training arguments
216-
- training_args = TrainingArguments(
217-
+ training_args = GaudiTrainingArguments(
218-
output_dir="path/to/save/folder/",
219-
+ use_habana=True,
220-
+ use_lazy_mode=True,
221-
+ gaudi_config_name="Habana/bert-base-uncased",
222-
...
223-
)
224-
225-
# Initialize the trainer
226-
- trainer = Trainer(
227-
+ trainer = GaudiTrainer(
228-
model=model,
229-
args=training_args,
230-
train_dataset=train_dataset,
231-
...
232-
)
233-
234-
# Use Habana Gaudi processor for training!
235-
trainer.train()
236-
```
237-
238-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
113+
You can find examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
239114

240115
### ONNX Runtime
241116

@@ -246,63 +121,4 @@ Before you begin, make sure you have all the necessary libraries installed :
246121
pip install optimum[onnxruntime-training]
247122
```
248123

249-
```diff
250-
- from transformers import Trainer, TrainingArguments
251-
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments
252-
253-
# Download a pretrained model from the Hub
254-
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
255-
256-
# Define the training arguments
257-
- training_args = TrainingArguments(
258-
+ training_args = ORTTrainingArguments(
259-
output_dir="path/to/save/folder/",
260-
optim="adamw_ort_fused",
261-
...
262-
)
263-
264-
# Create a ONNX Runtime Trainer
265-
- trainer = Trainer(
266-
+ trainer = ORTTrainer(
267-
model=model,
268-
args=training_args,
269-
train_dataset=train_dataset,
270-
...
271-
)
272-
273-
# Use ONNX Runtime for training!
274-
trainer.train()
275-
```
276-
277-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).
278-
279-
280-
### Quanto
281-
282-
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.
283-
284-
You can quantize a model either using the python API or the `optimum-cli`.
285-
286-
```python
287-
from transformers import AutoModelForCausalLM
288-
from optimum.quanto import QuantizedModelForCausalLM, qint4
289-
290-
model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')
291-
qmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')
292-
```
293-
294-
The quantized model can be saved using `save_pretrained`:
295-
296-
```python
297-
qmodel.save_pretrained('./Llama-3.1-8B-quantized')
298-
```
299-
300-
It can later be reloaded using `from_pretrained`:
301-
302-
```python
303-
from optimum.quanto import QuantizedModelForCausalLM
304-
305-
qmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')
306-
```
307-
308-
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
124+
You can find examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).

optimum/exporters/onnx/base.py

+3
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,9 @@ class OnnxConfig(ExportConfig, ABC):
159159
"image-to-image": OrderedDict(
160160
{"reconstruction": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"}}
161161
),
162+
"keypoint-detection": OrderedDict(
163+
{"heatmaps": {0: "batch_size", 1: "num_keypoints", 2: "height", 3: "width"}}
164+
),
162165
"mask-generation": OrderedDict({"logits": {0: "batch_size"}}),
163166
"masked-im": OrderedDict(
164167
{"reconstruction" if is_transformers_version(">=", "4.29.0") else "logits": {0: "batch_size"}}

optimum/exporters/onnx/convert.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -513,8 +513,10 @@ def export_pytorch(
513513

514514
model_kwargs = model_kwargs or {}
515515
# num_logits_to_keep was added in transformers 4.45 and isn't added as inputs when exporting the model
516-
if is_transformers_version(">=", "4.44.99") and "num_logits_to_keep" in signature(model.forward).parameters.keys():
517-
model_kwargs["num_logits_to_keep"] = 0
516+
if is_transformers_version(">=", "4.45"):
517+
logits_to_keep_name = "logits_to_keep" if is_transformers_version(">=", "4.49") else "num_logits_to_keep"
518+
if logits_to_keep_name in signature(model.forward).parameters.keys():
519+
model_kwargs[logits_to_keep_name] = 0
518520

519521
with torch.no_grad():
520522
model.config.return_dict = True

0 commit comments

Comments
 (0)