You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).
83
-
84
-
To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.
85
-
86
-
```diff
87
-
- from transformers import AutoModelForSequenceClassification
88
-
+ from optimum.intel import OVModelForSequenceClassification
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).
101
-
102
-
### Neural Compressor
103
-
104
-
Before you begin, make sure you have all the necessary libraries installed :
model = INCModelForSequenceClassification.from_pretrained(model_id)
122
-
```
123
-
124
-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).
125
55
126
56
### ONNX + ONNX Runtime
127
57
@@ -131,43 +61,11 @@ Before you begin, make sure you have all the necessary libraries installed :
131
61
pip install optimum[exporters,onnxruntime]
132
62
```
133
63
134
-
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:
The model can then be quantized using `onnxruntime`:
141
-
142
-
```bash
143
-
optimum-cli onnxruntime quantize \
144
-
--avx512 \
145
-
--onnx_model roberta_base_qa_onnx \
146
-
-o quantized_roberta_base_qa_onnx
147
-
```
148
-
149
-
These commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).
64
+
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
150
65
151
66
For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).
152
67
153
-
#### Run the exported model using ONNX Runtime
154
-
155
-
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:
156
-
157
-
```diff
158
-
- from transformers import AutoModelForQuestionAnswering
159
-
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend.
171
69
172
70
More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
173
71
@@ -179,14 +77,21 @@ Before you begin, make sure you have all the necessary libraries installed :
179
77
pip install optimum[exporters-tf]
180
78
```
181
79
182
-
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:
80
+
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them.
81
+
You can find more information in our [documentation](https://huggingface.co/docs/optimum/main/exporters/tflite/usage_guides/export_a_model).
183
82
184
-
```plain
185
-
optimum-cli export tflite \
186
-
-m deepset/roberta-base-squad2 \
187
-
--sequence_length 384 \
188
-
--quantize int8-dynamic roberta_tflite_model
189
-
```
83
+
### Intel (OpenVINO + Neural Compressor + IPEX)
84
+
85
+
Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
86
+
87
+
You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
88
+
89
+
90
+
### Quanto
91
+
92
+
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the `optimum-cli`.
93
+
94
+
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
190
95
191
96
## Accelerated training
192
97
@@ -205,37 +110,7 @@ Before you begin, make sure you have all the necessary libraries installed :
- from transformers import Trainer, TrainingArguments
210
-
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments
211
-
212
-
# Download a pretrained model from the Hub
213
-
model = AutoModelForXxx.from_pretrained("bert-base-uncased")
214
-
215
-
# Define the training arguments
216
-
- training_args = TrainingArguments(
217
-
+ training_args = GaudiTrainingArguments(
218
-
output_dir="path/to/save/folder/",
219
-
+ use_habana=True,
220
-
+ use_lazy_mode=True,
221
-
+ gaudi_config_name="Habana/bert-base-uncased",
222
-
...
223
-
)
224
-
225
-
# Initialize the trainer
226
-
- trainer = Trainer(
227
-
+ trainer = GaudiTrainer(
228
-
model=model,
229
-
args=training_args,
230
-
train_dataset=train_dataset,
231
-
...
232
-
)
233
-
234
-
# Use Habana Gaudi processor for training!
235
-
trainer.train()
236
-
```
237
-
238
-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
113
+
You can find examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
You can find examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).
250
-
251
-
252
-
### Quanto
253
-
254
-
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.
255
-
256
-
You can quantize a model either using the python API or the `optimum-cli`.
257
-
258
-
```python
259
-
from transformers import AutoModelForCausalLM
260
-
from optimum.quanto import QuantizedModelForCausalLM, qint4
261
-
262
-
model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
0 commit comments