Skip to content

Commit ce533cf

Browse files
authored
Remove README code snippets (#2195)
Cleanup README
1 parent 8c500c8 commit ce533cf

File tree

1 file changed

+17
-173
lines changed

1 file changed

+17
-173
lines changed

README.md

+17-173
Original file line numberDiff line numberDiff line change
@@ -52,76 +52,6 @@ python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/op
5252

5353
The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.
5454

55-
### Features summary
56-
57-
| Features | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|
58-
|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|
59-
| Graph optimization | :heavy_check_mark: | N/A | :heavy_check_mark: | N/A |
60-
| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A | :heavy_check_mark: |
61-
| Post-training static quantization | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
62-
| Quantization Aware Training (QAT) | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
63-
| FP16 (half precision) | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: |
64-
| Pruning | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
65-
| Knowledge Distillation | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
66-
67-
68-
### OpenVINO
69-
70-
Before you begin, make sure you have all the necessary libraries installed :
71-
72-
```bash
73-
pip install --upgrade --upgrade-strategy eager optimum[openvino]
74-
```
75-
76-
It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO format easily:
77-
78-
```bash
79-
optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
80-
```
81-
82-
If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).
83-
84-
To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.
85-
86-
```diff
87-
- from transformers import AutoModelForSequenceClassification
88-
+ from optimum.intel import OVModelForSequenceClassification
89-
from transformers import AutoTokenizer, pipeline
90-
91-
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
92-
tokenizer = AutoTokenizer.from_pretrained(model_id)
93-
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
94-
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
95-
96-
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
97-
results = classifier("He's a dreadful magician.")
98-
```
99-
100-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).
101-
102-
### Neural Compressor
103-
104-
Before you begin, make sure you have all the necessary libraries installed :
105-
106-
```bash
107-
pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
108-
```
109-
110-
Dynamic quantization can be applied on your model:
111-
112-
```bash
113-
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
114-
```
115-
116-
To load a model quantized with Intel Neural Compressor, hosted locally or on the 🤗 hub, you can do as follows :
117-
```python
118-
from optimum.intel import INCModelForSequenceClassification
119-
120-
model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
121-
model = INCModelForSequenceClassification.from_pretrained(model_id)
122-
```
123-
124-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).
12555

12656
### ONNX + ONNX Runtime
12757

@@ -131,43 +61,11 @@ Before you begin, make sure you have all the necessary libraries installed :
13161
pip install optimum[exporters,onnxruntime]
13262
```
13363

134-
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:
135-
136-
```plain
137-
optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx
138-
```
139-
140-
The model can then be quantized using `onnxruntime`:
141-
142-
```bash
143-
optimum-cli onnxruntime quantize \
144-
--avx512 \
145-
--onnx_model roberta_base_qa_onnx \
146-
-o quantized_roberta_base_qa_onnx
147-
```
148-
149-
These commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).
64+
It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
15065

15166
For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).
15267

153-
#### Run the exported model using ONNX Runtime
154-
155-
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:
156-
157-
```diff
158-
- from transformers import AutoModelForQuestionAnswering
159-
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
160-
from transformers import AutoTokenizer, pipeline
161-
162-
model_id = "deepset/roberta-base-squad2"
163-
tokenizer = AutoTokenizer.from_pretrained(model_id)
164-
- model = AutoModelForQuestionAnswering.from_pretrained(model_id)
165-
+ model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
166-
qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
167-
question = "What's Optimum?"
168-
context = "Optimum is an awesome library everyone should use!"
169-
results = qa_pipe(question=question, context=context)
170-
```
68+
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend.
17169

17270
More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
17371

@@ -179,14 +77,21 @@ Before you begin, make sure you have all the necessary libraries installed :
17977
pip install optimum[exporters-tf]
18078
```
18179

182-
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:
80+
Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them.
81+
You can find more information in our [documentation](https://huggingface.co/docs/optimum/main/exporters/tflite/usage_guides/export_a_model).
18382

184-
```plain
185-
optimum-cli export tflite \
186-
-m deepset/roberta-base-squad2 \
187-
--sequence_length 384 \
188-
--quantize int8-dynamic roberta_tflite_model
189-
```
83+
### Intel (OpenVINO + Neural Compressor + IPEX)
84+
85+
Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
86+
87+
You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
88+
89+
90+
### Quanto
91+
92+
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the `optimum-cli`.
93+
94+
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
19095

19196
## Accelerated training
19297

@@ -205,37 +110,7 @@ Before you begin, make sure you have all the necessary libraries installed :
205110
pip install --upgrade --upgrade-strategy eager optimum[habana]
206111
```
207112

208-
```diff
209-
- from transformers import Trainer, TrainingArguments
210-
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments
211-
212-
# Download a pretrained model from the Hub
213-
model = AutoModelForXxx.from_pretrained("bert-base-uncased")
214-
215-
# Define the training arguments
216-
- training_args = TrainingArguments(
217-
+ training_args = GaudiTrainingArguments(
218-
output_dir="path/to/save/folder/",
219-
+ use_habana=True,
220-
+ use_lazy_mode=True,
221-
+ gaudi_config_name="Habana/bert-base-uncased",
222-
...
223-
)
224-
225-
# Initialize the trainer
226-
- trainer = Trainer(
227-
+ trainer = GaudiTrainer(
228-
model=model,
229-
args=training_args,
230-
train_dataset=train_dataset,
231-
...
232-
)
233-
234-
# Use Habana Gaudi processor for training!
235-
trainer.train()
236-
```
237-
238-
You can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
113+
You can find examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
239114

240115
### ONNX Runtime
241116

@@ -247,34 +122,3 @@ pip install optimum[onnxruntime-training]
247122
```
248123

249124
You can find examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).
250-
251-
252-
### Quanto
253-
254-
[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.
255-
256-
You can quantize a model either using the python API or the `optimum-cli`.
257-
258-
```python
259-
from transformers import AutoModelForCausalLM
260-
from optimum.quanto import QuantizedModelForCausalLM, qint4
261-
262-
model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')
263-
qmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')
264-
```
265-
266-
The quantized model can be saved using `save_pretrained`:
267-
268-
```python
269-
qmodel.save_pretrained('./Llama-3.1-8B-quantized')
270-
```
271-
272-
It can later be reloaded using `from_pretrained`:
273-
274-
```python
275-
from optimum.quanto import QuantizedModelForCausalLM
276-
277-
qmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')
278-
```
279-
280-
You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.

0 commit comments

Comments
 (0)