You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -54,13 +54,13 @@ The list of supported model below:
54
54
-[DeiT](https://arxiv.org/abs/2012.12877)
55
55
-[Electra](https://arxiv.org/abs/2003.10555)
56
56
-[Ernie](https://arxiv.org/abs/1904.09223)
57
-
-[Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
57
+
-[Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
-[GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
63
+
-[GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
64
64
-[HuBERT](https://arxiv.org/pdf/2106.07447.pdf)
65
65
-[LayoutLM](https://arxiv.org/abs/1912.13318)
66
66
-[Llama & Llama2](https://arxiv.org/abs/2302.13971) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
Copy file name to clipboardexpand all lines: docs/source/bettertransformer/tutorials/contribute.mdx
+2-2
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,7 @@ Now, make sure to fill all the necessary attributes, the list of attributes are:
112
112
113
113
Note that these attributes correspond to all the components that are necessary to run a Transformer Encoder module, check the figure 1 on the ["Attention Is All You Need"](https://arxiv.org/pdf/1706.03762.pdf) paper.
114
114
115
-
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contigufied", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
115
+
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contiguified", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
First of all, start with the line `super().forward_checker()`, this is needed so that the parent class can run all the safety checkers before.
127
127
128
-
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar accross models, but sometimes the shapes of the attention masks are different across models.
128
+
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar across models, but sometimes the shapes of the attention masks are different across models.
Copy file name to clipboardexpand all lines: docs/source/bettertransformer/tutorials/convert.mdx
+3-3
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ Sometimes you can directly load your model on your GPU devices using `accelerate
45
45
46
46
## Step 2: Set your model on your preferred device
47
47
48
-
If you did not used`device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
48
+
If you did not use`device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
49
49
```python
50
50
>>> model = model.to(0) # or model.to("cuda:0")
51
51
```
@@ -92,7 +92,7 @@ You can also use `transformers.pipeline` as usual and pass the converted model d
92
92
>>>...
93
93
```
94
94
95
-
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!
95
+
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you run into any issue, do not hesitate to open an issue on GitHub!
96
96
97
97
## Training compatibility
98
98
@@ -113,4 +113,4 @@ model = BetterTransformer.transform(model)
0 commit comments