You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* docs: fixed minor typo in quicktour.mdx
* docs: fixed missing closing quotations mark in onnx/runtime/usage_guides/models.mdx
* docs: removed extra quotation mark at the end in onnxruntime/usage_guides/optimization.mdx
* docs: fixed spelling of NVIDIA in bettertransformer/overview.mdx
* docs: fixed other typos in bettertransformer/overview.mdx
* docs: fixed multiple typos in bettertransformer/tutorials/contribute.mdx
* docs: corrected minor typos and grammar in bettertransformer/tutorials/convert.mdx
@@ -54,13 +54,13 @@ The list of supported model below:
54
54
-[DeiT](https://arxiv.org/abs/2012.12877)
55
55
-[Electra](https://arxiv.org/abs/2003.10555)
56
56
-[Ernie](https://arxiv.org/abs/1904.09223)
57
-
-[Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
57
+
-[Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
-[GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
63
+
-[GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
64
64
-[HuBERT](https://arxiv.org/pdf/2106.07447.pdf)
65
65
-[LayoutLM](https://arxiv.org/abs/1912.13318)
66
66
-[Llama & Llama2](https://arxiv.org/abs/2302.13971) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
Copy file name to clipboardexpand all lines: docs/source/bettertransformer/tutorials/contribute.mdx
+2-2
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,7 @@ Now, make sure to fill all the necessary attributes, the list of attributes are:
112
112
113
113
Note that these attributes correspond to all the components that are necessary to run a Transformer Encoder module, check the figure 1 on the ["Attention Is All You Need"](https://arxiv.org/pdf/1706.03762.pdf) paper.
114
114
115
-
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contigufied", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
115
+
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contiguified", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
First of all, start with the line `super().forward_checker()`, this is needed so that the parent class can run all the safety checkers before.
127
127
128
-
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar accross models, but sometimes the shapes of the attention masks are different across models.
128
+
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar across models, but sometimes the shapes of the attention masks are different across models.
Copy file name to clipboardexpand all lines: docs/source/bettertransformer/tutorials/convert.mdx
+3-3
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ Sometimes you can directly load your model on your GPU devices using `accelerate
45
45
46
46
## Step 2: Set your model on your preferred device
47
47
48
-
If you did not used`device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
48
+
If you did not use`device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
49
49
```python
50
50
>>> model = model.to(0) # or model.to("cuda:0")
51
51
```
@@ -92,7 +92,7 @@ You can also use `transformers.pipeline` as usual and pass the converted model d
92
92
>>>...
93
93
```
94
94
95
-
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!
95
+
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you run into any issue, do not hesitate to open an issue on GitHub!
96
96
97
97
## Training compatibility
98
98
@@ -113,4 +113,4 @@ model = BetterTransformer.transform(model)
Copy file name to clipboardexpand all lines: docs/source/onnxruntime/usage_guides/optimization.mdx
+1-1
Original file line number
Diff line number
Diff line change
@@ -132,7 +132,7 @@ Below you will find an easy end-to-end example on how to optimize [distilbert-ba
132
132
```
133
133
134
134
135
-
Below you will find an easy end-to-end example on how to optimize a Seq2Seq model [sshleifer/distilbart-cnn-12-6"](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
135
+
Below you will find an easy end-to-end example on how to optimize a Seq2Seq model [sshleifer/distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Copy file name to clipboardexpand all lines: docs/source/quicktour.mdx
+1-1
Original file line number
Diff line number
Diff line change
@@ -185,7 +185,7 @@ Check out the [documentation](https://huggingface.co/docs/optimum/exporters/onnx
185
185
186
186
## PyTorch's BetterTransformer support
187
187
188
-
[BetterTransformer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) is a free-lunch PyTorch-native optimization to gain x1.25 - x4 speedup on the inference of Transformer-based models. It has been marked as stable in [PyTorch 1.13](https://pytorch.org/blog/PyTorch-1.13-release/). We integrated BetterTransformer with the most-used models from the 🤗 Transformers libary, and using the integration is as simple as:
188
+
[BetterTransformer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) is a free-lunch PyTorch-native optimization to gain x1.25 - x4 speedup on the inference of Transformer-based models. It has been marked as stable in [PyTorch 1.13](https://pytorch.org/blog/PyTorch-1.13-release/). We integrated BetterTransformer with the most-used models from the 🤗 Transformers library, and using the integration is as simple as:
0 commit comments