Skip to content

Commit 180bddf

Browse files
authoredFeb 11, 2025··
Fixed All Typos in docs (#2185)
* docs: fixed minor typo in quicktour.mdx * docs: fixed missing closing quotations mark in onnx/runtime/usage_guides/models.mdx * docs: removed extra quotation mark at the end in onnxruntime/usage_guides/optimization.mdx * docs: fixed spelling of NVIDIA in bettertransformer/overview.mdx * docs: fixed other typos in bettertransformer/overview.mdx * docs: fixed multiple typos in bettertransformer/tutorials/contribute.mdx * docs: corrected minor typos and grammar in bettertransformer/tutorials/convert.mdx
1 parent afff2fa commit 180bddf

File tree

6 files changed

+11
-11
lines changed

6 files changed

+11
-11
lines changed
 

‎docs/source/bettertransformer/overview.mdx

+3-3
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ specific language governing permissions and limitations under the License.
1616

1717
## Quickstart
1818

19-
Since its 1.13 version, [PyTorch released](https://pytorch.org/blog/PyTorch-1.13-release/) the stable version of a fast path for its standard Transformer APIs that provides out of the box performance improvements for transformer-based models. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NIVIDIA GPUs.
19+
Since its 1.13 version, [PyTorch released](https://pytorch.org/blog/PyTorch-1.13-release/) the stable version of a fast path for its standard Transformer APIs that provides out of the box performance improvements for transformer-based models. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NVIDIA GPUs.
2020
You can now use this feature in 🤗 Optimum together with Transformers and use it for major models in the Hugging Face ecosystem.
2121

2222
In the 2.0 version, PyTorch includes a native scaled dot-product attention operator (SDPA) as part of `torch.nn.functional`. This function encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the [official documentation](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) for more information, and [this blog post](https://pytorch.org/blog/out-of-the-box-acceleration/) for benchmarks.
@@ -54,13 +54,13 @@ The list of supported model below:
5454
- [DeiT](https://arxiv.org/abs/2012.12877)
5555
- [Electra](https://arxiv.org/abs/2003.10555)
5656
- [Ernie](https://arxiv.org/abs/1904.09223)
57-
- [Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
57+
- [Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
5858
- [FSMT](https://arxiv.org/abs/1907.06616)
5959
- [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
6060
- [GPT-j](https://huggingface.co/EleutherAI/gpt-j-6B)
6161
- [GPT-neo](https://github.com/EleutherAI/gpt-neo)
6262
- [GPT-neo-x](https://arxiv.org/abs/2204.06745)
63-
- [GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
63+
- [GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
6464
- [HuBERT](https://arxiv.org/pdf/2106.07447.pdf)
6565
- [LayoutLM](https://arxiv.org/abs/1912.13318)
6666
- [Llama & Llama2](https://arxiv.org/abs/2302.13971) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))

‎docs/source/bettertransformer/tutorials/contribute.mdx

+2-2
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Now, make sure to fill all the necessary attributes, the list of attributes are:
112112

113113
Note that these attributes correspond to all the components that are necessary to run a Transformer Encoder module, check the figure 1 on the ["Attention Is All You Need"](https://arxiv.org/pdf/1706.03762.pdf) paper.
114114

115-
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contigufied", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
115+
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contiguified", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
116116

117117
Make sure also to add the lines:
118118
```python
@@ -125,7 +125,7 @@ self.validate_bettertransformer()
125125

126126
First of all, start with the line `super().forward_checker()`, this is needed so that the parent class can run all the safety checkers before.
127127

128-
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar accross models, but sometimes the shapes of the attention masks are different across models.
128+
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar across models, but sometimes the shapes of the attention masks are different across models.
129129
```python
130130
super().forward_checker()
131131

‎docs/source/bettertransformer/tutorials/convert.mdx

+3-3
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Sometimes you can directly load your model on your GPU devices using `accelerate
4545

4646
## Step 2: Set your model on your preferred device
4747

48-
If you did not used `device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
48+
If you did not use `device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
4949
```python
5050
>>> model = model.to(0) # or model.to("cuda:0")
5151
```
@@ -92,7 +92,7 @@ You can also use `transformers.pipeline` as usual and pass the converted model d
9292
>>> ...
9393
```
9494

95-
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!
95+
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you run into any issue, do not hesitate to open an issue on GitHub!
9696

9797
## Training compatibility
9898

@@ -113,4 +113,4 @@ model = BetterTransformer.transform(model)
113113
model = BetterTransformer.reverse(model)
114114
model.save_pretrained("fine_tuned_model")
115115
model.push_to_hub("fine_tuned_model")
116-
```
116+
```

‎docs/source/onnxruntime/usage_guides/models.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Once your model was [exported to the ONNX format](https://huggingface.co/docs/op
1616
- from transformers import AutoModelForCausalLM
1717
+ from optimum.onnxruntime import ORTModelForCausalLM
1818

19-
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B) # PyTorch checkpoint
19+
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
2020
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
2121
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
2222

‎docs/source/onnxruntime/usage_guides/optimization.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ Below you will find an easy end-to-end example on how to optimize [distilbert-ba
132132
```
133133

134134

135-
Below you will find an easy end-to-end example on how to optimize a Seq2Seq model [sshleifer/distilbart-cnn-12-6"](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
135+
Below you will find an easy end-to-end example on how to optimize a Seq2Seq model [sshleifer/distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
136136

137137
```python
138138
>>> from transformers import AutoTokenizer

‎docs/source/quicktour.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ Check out the [documentation](https://huggingface.co/docs/optimum/exporters/onnx
185185

186186
## PyTorch's BetterTransformer support
187187

188-
[BetterTransformer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) is a free-lunch PyTorch-native optimization to gain x1.25 - x4 speedup on the inference of Transformer-based models. It has been marked as stable in [PyTorch 1.13](https://pytorch.org/blog/PyTorch-1.13-release/). We integrated BetterTransformer with the most-used models from the 🤗 Transformers libary, and using the integration is as simple as:
188+
[BetterTransformer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) is a free-lunch PyTorch-native optimization to gain x1.25 - x4 speedup on the inference of Transformer-based models. It has been marked as stable in [PyTorch 1.13](https://pytorch.org/blog/PyTorch-1.13-release/). We integrated BetterTransformer with the most-used models from the 🤗 Transformers library, and using the integration is as simple as:
189189

190190
```python
191191
>>> from optimum.bettertransformer import BetterTransformer

0 commit comments

Comments
 (0)
Please sign in to comment.