Skip to content

Commit d867903

Browse files
committed
Merge branch 'main' into onnx
2 parents 795a618 + 512d5c6 commit d867903

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1580
-1588
lines changed

.github/workflows/test_export_onnx_cli.yml

+14-8
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@ name: Exporters ONNX CLI / Python - Test
22

33
on:
44
push:
5-
branches: [main]
5+
branches:
6+
- main
67
pull_request:
7-
branches: [main]
8+
branches:
9+
- main
810

911
concurrency:
1012
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -19,16 +21,20 @@ jobs:
1921
os: [ubuntu-20.04]
2022

2123
runs-on: ${{ matrix.os }}
24+
2225
steps:
23-
- uses: actions/checkout@v2
26+
- name: Checkout repository
27+
uses: actions/checkout@v4
28+
2429
- name: Setup Python ${{ matrix.python-version }}
25-
uses: actions/setup-python@v2
30+
uses: actions/setup-python@v5
2631
with:
2732
python-version: ${{ matrix.python-version }}
28-
- name: Install dependencies for pytorch export
33+
34+
- name: Install dependencies
2935
run: |
3036
pip install .[tests,exporters,diffusers]
31-
- name: Test with unittest
32-
working-directory: tests
37+
38+
- name: Test with pytest
3339
run: |
34-
pytest exporters/onnx/test_exporters_onnx_cli.py -n auto -m "not tensorflow_test and not timm_test" -s --durations=0
40+
pytest tests/exporters/onnx/test_exporters_onnx_cli.py -n auto -m "not tensorflow_test and not timm_test" -s --durations=0

.github/workflows/test_onnxruntime.yml

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
2-
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
31
name: ONNX Runtime / Python - Test
42

53
on:
64
push:
7-
branches: [main]
5+
branches:
6+
- main
87
pull_request:
9-
branches: [main]
8+
branches:
9+
- main
1010

1111
concurrency:
1212
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -58,10 +58,10 @@ jobs:
5858
5959
- name: Test with pytest (in series)
6060
run: |
61-
pytest tests/onnxruntime -m "run_in_series" --durations=0 -vvvv -s
61+
pytest tests/onnxruntime -m "run_in_series" --durations=0 -vvvv
6262
6363
- name: Test with pytest (in parallel)
6464
run: |
65-
pytest tests/onnxruntime -m "not run_in_series" --durations=0 -vvvv -s -n auto
65+
pytest tests/onnxruntime -m "not run_in_series" --durations=0 -vvvv -n auto
6666
env:
6767
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+41-17
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,54 @@
1-
name: ONNX Runtime / Test GPU
1+
name: ONNX Runtime GPU / Python - Test
22

33
on:
44
workflow_dispatch:
55
schedule:
6-
- cron: 0 1 */3 * * # at 1am every 3 days
6+
- cron: 0 7 * * * # every day at 7am UTC
77
pull_request:
8-
types: [opened, synchronize, reopened, labeled]
9-
# uncomment to enable on PR merge on main branch:
10-
#push:
11-
# branches:
12-
# - main
8+
branches:
9+
- main
10+
types:
11+
- opened
12+
- labeled
13+
- reopened
14+
- unlabeled
15+
- synchronize
16+
17+
concurrency:
18+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
19+
cancel-in-progress: true
1320

1421
jobs:
15-
do-the-job:
16-
if: ${{ (github.event_name == 'workflow_dispatch') || (github.event_name == 'schedule') || contains( github.event.pull_request.labels.*.name, 'gpu-test') }}
17-
name: Start self-hosted EC2 runner
22+
build:
23+
if: ${{
24+
(github.event_name == 'push') ||
25+
(github.event_name == 'workflow_dispatch') ||
26+
contains(github.event.pull_request.labels.*.name, 'gpu') ||
27+
contains(github.event.pull_request.labels.*.name, 'onnxruntime-gpu')
28+
}}
29+
1830
runs-on:
1931
group: aws-g6-4xlarge-plus
20-
env:
21-
AWS_REGION: us-east-1
32+
33+
container:
34+
image: nvcr.io/nvidia/tensorrt:24.12-py3
35+
options: --gpus all
36+
2237
steps:
2338
- name: Checkout
24-
uses: actions/checkout@v2
25-
- name: Build image
39+
uses: actions/checkout@v4
40+
41+
- name: Setup Python
42+
uses: actions/setup-python@v5
43+
with:
44+
python-version: "3.9"
45+
46+
- name: Install dependencies
2647
run: |
27-
docker build -f tests/onnxruntime/docker/Dockerfile_onnxruntime_gpu -t onnxruntime-gpu .
28-
- name: Test with unittest within docker container
48+
pip install --upgrade pip
49+
pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
50+
pip install .[tests,onnxruntime-gpu,diffusers]
51+
52+
- name: Test with pytest
2953
run: |
30-
docker run --rm --gpus all -v /mnt/cache/.cache/huggingface:/root/.cache/huggingface --workdir=/workspace/optimum/tests onnxruntime-gpu:latest
54+
pytest tests/onnxruntime -m "cuda_ep_test or trt_ep_test" --durations=0 -vvvv -n auto
+37-20
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,50 @@
1-
name: ONNX Runtime slow / Python - Test
1+
name: ONNX Runtime Slow / Python - Test
22

33
on:
44
workflow_dispatch:
55
schedule:
6-
- cron: 0 7 * * * # every day at 7am
6+
- cron: 0 7 * * * # every day at 7am UTC
7+
pull_request:
8+
branches:
9+
- main
10+
types:
11+
- opened
12+
- labeled
13+
- reopened
14+
- unlabeled
15+
- synchronize
716

817
concurrency:
918
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
1019
cancel-in-progress: true
1120

1221
jobs:
1322
build:
14-
strategy:
15-
fail-fast: false
16-
matrix:
17-
python-version: ["3.9"]
18-
os: [ubuntu-20.04]
23+
if: ${{
24+
(github.event_name == 'push') ||
25+
(github.event_name == 'workflow_dispatch') ||
26+
contains(github.event.pull_request.labels.*.name, 'slow') ||
27+
contains(github.event.pull_request.labels.*.name, 'onnxruntime-slow')
28+
}}
29+
30+
runs-on:
31+
group: aws-general-8-plus
1932

20-
runs-on: ${{ matrix.os }}
2133
steps:
22-
- uses: actions/checkout@v2
23-
- name: Setup Python ${{ matrix.python-version }}
24-
uses: actions/setup-python@v2
25-
with:
26-
python-version: ${{ matrix.python-version }}
27-
- name: Install dependencies for export
28-
run: |
29-
pip install .[tests,onnxruntime,diffusers]
30-
- name: Test with unittest
31-
working-directory: tests
32-
run: |
33-
RUN_SLOW=1 pytest onnxruntime -s -m "run_slow" --durations=0
34+
- name: Checkout
35+
uses: actions/checkout@v4
36+
37+
- name: Setup Python 3.9
38+
uses: actions/setup-python@v5
39+
with:
40+
python-version: "3.9"
41+
42+
- name: Install dependencies
43+
run: |
44+
pip install --upgrade pip
45+
pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
46+
pip install .[tests,onnxruntime,diffusers]
47+
48+
- name: Test with pytest
49+
run: |
50+
RUN_SLOW=1 pytest tests/onnxruntime -m "run_slow" --durations=0 -vvvv

.github/workflows/test_onnxruntime_train.yml

-26
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
name: ONNX Runtime Training / Python - Test
2+
3+
on:
4+
workflow_dispatch:
5+
schedule:
6+
- cron: 0 7 * * * # every day at 7am UTC
7+
pull_request:
8+
branches:
9+
- main
10+
types:
11+
- opened
12+
- labeled
13+
- reopened
14+
- unlabeled
15+
- synchronize
16+
17+
concurrency:
18+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
19+
cancel-in-progress: true
20+
21+
jobs:
22+
build:
23+
if: ${{
24+
(github.event_name == 'push') ||
25+
(github.event_name == 'workflow_dispatch') ||
26+
contains( github.event.pull_request.labels.*.name, 'training') ||
27+
contains( github.event.pull_request.labels.*.name, 'onnxruntime-training')
28+
}}
29+
30+
runs-on:
31+
group: aws-g6-4xlarge-plus
32+
33+
container:
34+
image: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
35+
options: --gpus all
36+
37+
steps:
38+
- name: Checkout
39+
uses: actions/checkout@v4
40+
41+
- name: Setup Python
42+
uses: actions/setup-python@v5
43+
with:
44+
python-version: "3.9"
45+
46+
- name: Install dependencies
47+
env:
48+
TORCH_CUDA_ARCH_LIST: "5.0 6.0 7.0 7.5 8.0 8.6 9.0+PTX"
49+
run: |
50+
pip install --upgrade pip
51+
pip install --no-cache-dir "torch<2.6" torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
52+
pip install --no-cache-dir torch-ort onnxruntime-training && python -m torch_ort.configure
53+
pip install --no-cache-dir evaluate absl-py rouge_score seqeval sacrebleu nltk scikit-learn
54+
pip install .[tests,onnxruntime-training]
55+
56+
- name: Test with pytest (trainer)
57+
run: |
58+
RUN_SLOW=1 pytest tests/onnxruntime-training/test_trainer.py --durations=0 -vvvv
59+
env:
60+
HF_DATASETS_TRUST_REMOTE_CODE: 1
61+
62+
- name: Test with pytest (examples)
63+
run: |
64+
RUN_SLOW=1 pytest tests/onnxruntime-training/test_examples.py --durations=0 -vvvv
65+
env:
66+
HF_DATASETS_TRUST_REMOTE_CODE: 1

README.md

+7
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,13 @@ You can find more examples in the [documentation](https://huggingface.co/docs/op
239239

240240
### ONNX Runtime
241241

242+
243+
Before you begin, make sure you have all the necessary libraries installed :
244+
245+
```bash
246+
pip install optimum[onnxruntime-training]
247+
```
248+
242249
```diff
243250
- from transformers import Trainer, TrainingArguments
244251
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments

docs/source/bettertransformer/overview.mdx

+3-3
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ specific language governing permissions and limitations under the License.
1616

1717
## Quickstart
1818

19-
Since its 1.13 version, [PyTorch released](https://pytorch.org/blog/PyTorch-1.13-release/) the stable version of a fast path for its standard Transformer APIs that provides out of the box performance improvements for transformer-based models. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NIVIDIA GPUs.
19+
Since its 1.13 version, [PyTorch released](https://pytorch.org/blog/PyTorch-1.13-release/) the stable version of a fast path for its standard Transformer APIs that provides out of the box performance improvements for transformer-based models. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NVIDIA GPUs.
2020
You can now use this feature in 🤗 Optimum together with Transformers and use it for major models in the Hugging Face ecosystem.
2121

2222
In the 2.0 version, PyTorch includes a native scaled dot-product attention operator (SDPA) as part of `torch.nn.functional`. This function encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the [official documentation](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) for more information, and [this blog post](https://pytorch.org/blog/out-of-the-box-acceleration/) for benchmarks.
@@ -54,13 +54,13 @@ The list of supported model below:
5454
- [DeiT](https://arxiv.org/abs/2012.12877)
5555
- [Electra](https://arxiv.org/abs/2003.10555)
5656
- [Ernie](https://arxiv.org/abs/1904.09223)
57-
- [Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
57+
- [Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
5858
- [FSMT](https://arxiv.org/abs/1907.06616)
5959
- [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
6060
- [GPT-j](https://huggingface.co/EleutherAI/gpt-j-6B)
6161
- [GPT-neo](https://github.com/EleutherAI/gpt-neo)
6262
- [GPT-neo-x](https://arxiv.org/abs/2204.06745)
63-
- [GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
63+
- [GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
6464
- [HuBERT](https://arxiv.org/pdf/2106.07447.pdf)
6565
- [LayoutLM](https://arxiv.org/abs/1912.13318)
6666
- [Llama & Llama2](https://arxiv.org/abs/2302.13971) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))

docs/source/bettertransformer/tutorials/contribute.mdx

+2-2
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Now, make sure to fill all the necessary attributes, the list of attributes are:
112112

113113
Note that these attributes correspond to all the components that are necessary to run a Transformer Encoder module, check the figure 1 on the ["Attention Is All You Need"](https://arxiv.org/pdf/1706.03762.pdf) paper.
114114

115-
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contigufied", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
115+
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contiguified", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
116116

117117
Make sure also to add the lines:
118118
```python
@@ -125,7 +125,7 @@ self.validate_bettertransformer()
125125

126126
First of all, start with the line `super().forward_checker()`, this is needed so that the parent class can run all the safety checkers before.
127127

128-
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar accross models, but sometimes the shapes of the attention masks are different across models.
128+
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar across models, but sometimes the shapes of the attention masks are different across models.
129129
```python
130130
super().forward_checker()
131131

docs/source/bettertransformer/tutorials/convert.mdx

+3-3
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Sometimes you can directly load your model on your GPU devices using `accelerate
4545

4646
## Step 2: Set your model on your preferred device
4747

48-
If you did not used `device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
48+
If you did not use `device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
4949
```python
5050
>>> model = model.to(0) # or model.to("cuda:0")
5151
```
@@ -92,7 +92,7 @@ You can also use `transformers.pipeline` as usual and pass the converted model d
9292
>>> ...
9393
```
9494

95-
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!
95+
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you run into any issue, do not hesitate to open an issue on GitHub!
9696

9797
## Training compatibility
9898

@@ -113,4 +113,4 @@ model = BetterTransformer.transform(model)
113113
model = BetterTransformer.reverse(model)
114114
model.save_pretrained("fine_tuned_model")
115115
model.push_to_hub("fine_tuned_model")
116-
```
116+
```

docs/source/onnxruntime/usage_guides/models.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Once your model was [exported to the ONNX format](https://huggingface.co/docs/op
1616
- from transformers import AutoModelForCausalLM
1717
+ from optimum.onnxruntime import ORTModelForCausalLM
1818

19-
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B) # PyTorch checkpoint
19+
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
2020
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
2121
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
2222

0 commit comments

Comments
 (0)