Skip to content

Commit daa2781

Browse files
committed
Clean up ORT documentation (#2065)
* refactor ort doc * fix links * fix
1 parent 01d0aa7 commit daa2781

File tree

2 files changed

+48
-232
lines changed

2 files changed

+48
-232
lines changed

docs/source/onnxruntime/package_reference/modeling_ort.mdx

+5
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,11 @@ The following ORT classes are available for the following custom tasks.
119119

120120
## Stable Diffusion
121121

122+
#### ORTDiffusionPipeline
123+
124+
[[autodoc]] onnxruntime.ORTDiffusionPipeline
125+
- __call__
126+
122127
#### ORTStableDiffusionPipeline
123128

124129
[[autodoc]] onnxruntime.ORTStableDiffusionPipeline

docs/source/onnxruntime/usage_guides/models.mdx

+43-232
Original file line numberDiff line numberDiff line change
@@ -4,263 +4,74 @@ Optimum is a utility package for building and running inference with accelerated
44
Optimum can be used to load optimized models from the [Hugging Face Hub](hf.co/models) and create pipelines
55
to run accelerated inference without rewriting your APIs.
66

7-
## Switching from Transformers to Optimum
87

9-
The `optimum.onnxruntime.ORTModelForXXX` model classes are API compatible with Hugging Face Transformers models. This
10-
means you can just replace your `AutoModelForXXX` class with the corresponding `ORTModelForXXX` class in `optimum.onnxruntime`.
8+
## Loading
119

12-
You do not need to adapt your code to get it to work with `ORTModelForXXX` classes:
10+
### Transformers models
1311

14-
```diff
15-
from transformers import AutoTokenizer, pipeline
16-
-from transformers import AutoModelForQuestionAnswering
17-
+from optimum.onnxruntime import ORTModelForQuestionAnswering
18-
19-
-model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2") # PyTorch checkpoint
20-
+model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2") # ONNX checkpoint
21-
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
22-
23-
onnx_qa = pipeline("question-answering",model=model,tokenizer=tokenizer)
24-
25-
question = "What's my name?"
26-
context = "My name is Philipp and I live in Nuremberg."
27-
pred = onnx_qa(question, context)
28-
```
29-
30-
### Loading a vanilla Transformers model
31-
32-
Because the model you want to work with might not be already converted to ONNX, [`~optimum.onnxruntime.ORTModel`]
33-
includes a method to convert vanilla Transformers models to ONNX ones. Simply pass `export=True` to the
34-
[`~optimum.onnxruntime.ORTModel.from_pretrained`] method, and your model will be loaded and converted to ONNX on-the-fly:
35-
36-
```python
37-
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
38-
39-
>>> # Load the model from the hub and export it to the ONNX format
40-
>>> model = ORTModelForSequenceClassification.from_pretrained(
41-
... "distilbert-base-uncased-finetuned-sst-2-english", export=True
42-
... )
43-
```
44-
45-
### Pushing ONNX models to the Hugging Face Hub
46-
47-
It is also possible, just as with regular [`~transformers.PreTrainedModel`]s, to push your `ORTModelForXXX` to the
48-
[Hugging Face Model Hub](https://hf.co/models):
49-
50-
```python
51-
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
52-
53-
>>> # Load the model from the hub and export it to the ONNX format
54-
>>> model = ORTModelForSequenceClassification.from_pretrained(
55-
... "distilbert-base-uncased-finetuned-sst-2-english", export=True
56-
... )
57-
58-
>>> # Save the converted model
59-
>>> model.save_pretrained("a_local_path_for_convert_onnx_model")
60-
61-
# Push the onnx model to HF Hub
62-
>>> model.push_to_hub( # doctest: +SKIP
63-
... "a_local_path_for_convert_onnx_model", repository_id="my-onnx-repo", use_auth_token=True
64-
... )
65-
```
66-
67-
## Sequence-to-sequence models
68-
69-
Sequence-to-sequence (Seq2Seq) models can also be used when running inference with ONNX Runtime. When Seq2Seq models
70-
are exported to the ONNX format, they are decomposed into three parts that are later combined during inference:
71-
- The encoder part of the model
72-
- The decoder part of the model + the language modeling head
73-
- The same decoder part of the model + language modeling head but taking and using pre-computed key / values as inputs and
74-
outputs. This makes inference faster.
75-
76-
Here is an example of how you can load a T5 model to the ONNX format and run inference for a translation task:
77-
78-
79-
```python
80-
>>> from transformers import AutoTokenizer, pipeline
81-
>>> from optimum.onnxruntime import ORTModelForSeq2SeqLM
82-
83-
# Load the model from the hub and export it to the ONNX format
84-
>>> model_name = "t5-small"
85-
>>> model = ORTModelForSeq2SeqLM.from_pretrained(model_name, export=True)
86-
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
87-
88-
# Create a pipeline
89-
>>> onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
90-
>>> text = "He never went out without a book under his arm, and he often came back with two."
91-
>>> result = onnx_translation(text)
92-
>>> # [{'translation_text': "Il n'est jamais sorti sans un livre sous son bras, et il est souvent revenu avec deux."}]
93-
```
94-
95-
## Stable Diffusion
96-
97-
Stable Diffusion models can also be used when running inference with ONNX Runtime. When Stable Diffusion models
98-
are exported to the ONNX format, they are split into four components that are later combined during inference:
99-
- The text encoder
100-
- The U-NET
101-
- The VAE encoder
102-
- The VAE decoder
103-
104-
Make sure you have 🤗 Diffusers installed.
105-
106-
To install `diffusers`:
107-
```bash
108-
pip install diffusers
109-
```
110-
111-
### Text-to-Image
112-
113-
Here is an example of how you can load an ONNX Stable Diffusion model and run inference using ONNX Runtime:
114-
115-
```python
116-
from optimum.onnxruntime import ORTStableDiffusionPipeline
117-
118-
model_id = "runwayml/stable-diffusion-v1-5"
119-
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, revision="onnx")
120-
prompt = "sailing ship in storm by Leonardo da Vinci"
121-
image = pipeline(prompt).images[0]
122-
```
12+
Once your model was [exported to the ONNX format](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model), you can load it by replacing the `AutoModelForXxx` class with the corresponding `ORTModelForXxx`.
12313

124-
To load your PyTorch model and convert it to ONNX on-the-fly, you can set `export=True`.
125-
126-
```python
127-
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, export=True)
128-
129-
# Don't forget to save the ONNX model
130-
save_directory = "a_local_path"
131-
pipeline.save_pretrained(save_directory)
132-
```
133-
134-
<div class="flex justify-center">
135-
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/onnxruntime/stable_diffusion_v1_5_ort_sail_boat.png">
136-
</div>
137-
138-
### Image-to-Image
139-
140-
```python
141-
import requests
142-
import torch
143-
from PIL import Image
144-
from io import BytesIO
145-
from optimum.onnxruntime import ORTStableDiffusionImg2ImgPipeline
146-
147-
model_id = "runwayml/stable-diffusion-v1-5"
148-
pipeline = ORTStableDiffusionImg2ImgPipeline.from_pretrained(model_id, revision="onnx")
149-
150-
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
151-
152-
response = requests.get(url)
153-
init_image = Image.open(BytesIO(response.content)).convert("RGB")
154-
init_image = init_image.resize((768, 512))
155-
156-
prompt = "A fantasy landscape, trending on artstation"
157-
158-
image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
159-
image.save("fantasy_landscape.png")
160-
```
161-
162-
### Inpaint
163-
164-
```python
165-
import PIL
166-
import requests
167-
import torch
168-
from io import BytesIO
169-
from optimum.onnxruntime import ORTStableDiffusionInpaintPipeline
170-
171-
model_id = "runwayml/stable-diffusion-inpainting"
172-
pipeline = ORTStableDiffusionInpaintPipeline.from_pretrained(model_id, revision="onnx")
173-
174-
def download_image(url):
175-
response = requests.get(url)
176-
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
177-
178-
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
179-
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
14+
```diff
15+
from transformers import AutoTokenizer, pipeline
16+
- from transformers import AutoModelForQuestionAnswering
17+
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
18018

181-
init_image = download_image(img_url).resize((512, 512))
182-
mask_image = download_image(mask_url).resize((512, 512))
19+
- model = AutoModelForQuestionAnswering.from_pretrained("meta-llama/Llama-3.2-1B) # PyTorch checkpoint
20+
+ model = ORTModelForQuestionAnswering.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
21+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
18322

184-
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
185-
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
23+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
24+
result = pipe("He never went out without a book under his arm")
18625
```
18726

27+
More information for all the supported `ORTModelForXxx` in our [documentation](https://huggingface.co/docs/optimum/onnxruntime/package_reference/modeling_ort)
18828

189-
## Stable Diffusion XL
190-
191-
Before using `ORTStableDiffusionXLPipeline` make sure to have `diffusers` and `invisible_watermark` installed. You can install the libraries as follows:
19229

193-
```bash
194-
pip install diffusers
195-
pip install invisible-watermark>=0.2.0
196-
```
197-
198-
### Text-to-Image
199-
200-
Here is an example of how you can load a SDXL ONNX model from [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and run inference using ONNX Runtime :
30+
### Diffusers models
20131

202-
```python
203-
from optimum.onnxruntime import ORTStableDiffusionXLPipeline
32+
Once your model was [exported to the ONNX format](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model), you can load it by replacing the `DiffusionPipeline` class with the corresponding `ORTDiffusionPipeline`.
20433

205-
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
206-
base = ORTStableDiffusionXLPipeline.from_pretrained(model_id)
207-
prompt = "sailing ship in storm by Leonardo da Vinci"
208-
image = base(prompt).images[0]
20934

210-
# Don't forget to save the ONNX model
211-
save_directory = "sd_xl_base"
212-
base.save_pretrained(save_directory)
35+
```diff
36+
- from diffusers import DiffusionPipeline
37+
+ from optimum.onnxruntime import ORTDiffusionPipeline
38+
39+
model_id = "runwayml/stable-diffusion-v1-5"
40+
- pipeline = DiffusionPipeline.from_pretrained(model_id)
41+
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
42+
prompt = "sailing ship in storm by Leonardo da Vinci"
43+
image = pipeline(prompt).images[0]
21344
```
21445

46+
## Converting your model to ONNX on-the-fly
21547

216-
### Image-to-Image
217-
218-
Here is an example of how you can load a PyTorch SDXL model, convert it to ONNX on-the-fly and run inference using ONNX Runtime for *image-to-image* :
48+
In case your model wasn't already [converted to ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model), [`~optimum.onnxruntime.ORTModel`] includes a method to convert your model to ONNX on-the-fly.
49+
Simply pass `export=True` to the [`~optimum.onnxruntime.ORTModel.from_pretrained`] method, and your model will be loaded and converted to ONNX on-the-fly:
21950

22051
```python
221-
from optimum.onnxruntime import ORTStableDiffusionXLImg2ImgPipeline
222-
from diffusers.utils import load_image
223-
224-
model_id = "stabilityai/stable-diffusion-xl-refiner-1.0"
225-
pipeline = ORTStableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, export=True)
52+
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
22653

227-
url = "https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png"
228-
image = load_image(url).convert("RGB")
229-
prompt = "medieval castle by Caspar David Friedrich"
230-
image = pipeline(prompt, image=image).images[0]
231-
image.save("medieval_castle.png")
54+
>>> # Load the model from the hub and export it to the ONNX format
55+
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
56+
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
23257
```
23358

23459

235-
### Refining the image output
236-
237-
The image can be refined by making use of a model like [stabilityai/stable-diffusion-xl-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0). In this case, you only have to output the latents from the base model.
60+
## Pushing your model to the Hub
23861

62+
You can also call `push_to_hub` directly on your model to upload it to the [Hub](https://hf.co/models).
23963

24064
```python
241-
from optimum.onnxruntime import ORTStableDiffusionXLImg2ImgPipeline
242-
243-
model_id = "stabilityai/stable-diffusion-xl-refiner-1.0"
244-
refiner = ORTStableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, export=True)
245-
246-
image = base(prompt=prompt, output_type="latent").images[0]
247-
image = refiner(prompt=prompt, image=image[None, :]).images[0]
248-
image.save("sailing_ship.png")
249-
```
250-
251-
252-
253-
## Latent Consistency Models
254-
255-
### Text-to-Image
65+
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
25666

257-
Here is an example of how you can load a Latent Consistency Models (LCMs) from [SimianLuo/LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) and run inference using ONNX Runtime :
67+
>>> # Load the model from the hub and export it to the ONNX format
68+
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
69+
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
25870

259-
```python
260-
from optimum.onnxruntime import ORTLatentConsistencyModelPipeline
71+
>>> # Save the converted model locally
72+
>>> output_dir = "a_local_path_for_convert_onnx_model"
73+
>>> model.save_pretrained(output_dir)
26174

262-
model_id = "SimianLuo/LCM_Dreamshaper_v7"
263-
pipeline = ORTLatentConsistencyModelPipeline.from_pretrained(model_id, export=True)
264-
prompt = "sailing ship in storm by Leonardo da Vinci"
265-
images = pipeline(prompt, num_inference_steps=4, guidance_scale=8.0).images
266-
```
75+
# Push the onnx model to HF Hub
76+
>>> model.push_to_hub(output_dir, repository_id="my-onnx-repo") # doctest: +SKIP
77+
```

0 commit comments

Comments
 (0)