|
| 1 | +<!--- |
| 2 | +Copyright 2020 The HuggingFace Team. All rights reserved. |
| 3 | +
|
| 4 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 5 | +you may not use this file except in compliance with the License. |
| 6 | +You may obtain a copy of the License at |
| 7 | +
|
| 8 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +
|
| 10 | +Unless required by applicable law or agreed to in writing, software |
| 11 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | +See the License for the specific language governing permissions and |
| 14 | +limitations under the License. |
| 15 | +--> |
| 16 | + |
| 17 | +# Language modeling training |
| 18 | + |
| 19 | +The scripts [`run_clm.py`](https://github.com/huggingface/optimum/blob/main/examples/language-modeling/run_clm.py) |
| 20 | +and [`run_mlm.py`](https://github.com/huggingface/optimum/blob/main/examples/language-modeling/run_mlm.py) |
| 21 | +allow us to apply different quantization approaches (such as dynamic, static and aware-training quantization) as well as pruning |
| 22 | +using the [Intel Neural Compressor (INC)](https://github.com/intel/neural-compressor) library for language modeling tasks. |
| 23 | + |
| 24 | + |
| 25 | +GPT and GPT-2 are trained or fine-tuned using a causal language modeling (CLM) loss. ALBERT, BERT, DistilBERT and |
| 26 | +RoBERTa are trained or fine-tuned using a masked language modeling (MLM) loss, more information about the differences |
| 27 | +between those objectives can be found in our [model summary](https://huggingface.co/transformers/model_summary.html). |
| 28 | + |
| 29 | + |
| 30 | +### GPT-2/GPT and causal language modeling |
| 31 | + |
| 32 | +The following example fine-tunes GPT-Neo on WikiText-2 while first applying magnitude pruning and then quantization aware training. |
| 33 | +We're using the raw WikiText-2 (no tokens were replaced before the tokenization). The loss here is that of causal language modeling (CLM). |
| 34 | + |
| 35 | +```bash |
| 36 | +python run_clm.py \ |
| 37 | + --model_name_or_path EleutherAI/gpt-neo-125M \ |
| 38 | + --dataset_name wikitext \ |
| 39 | + --dataset_config_name wikitext-2-raw-v1 \ |
| 40 | + --quantize \ |
| 41 | + --quantization_approach aware_training \ |
| 42 | + --prune \ |
| 43 | + --target_sparsity 0.02 \ |
| 44 | + --perf_tol 0.5 \ |
| 45 | + --do_train \ |
| 46 | + --do_eval \ |
| 47 | + --verify_loading \ |
| 48 | + --output_dir /tmp/clm_output |
| 49 | +``` |
| 50 | + |
| 51 | +### RoBERTa/BERT/DistilBERT and masked language modeling |
| 52 | + |
| 53 | +The following example fine-tunes RoBERTa on WikiText-2 while applying quantization aware training and magnitude pruning. We're using the raw |
| 54 | +WikiText-2. The loss is different as BERT/RoBERTa have a bidirectional mechanism, we are therefore using the same loss |
| 55 | +that was used during their pre-training: masked language modeling (MLM) loss. |
| 56 | + |
| 57 | +```bash |
| 58 | +python run_mlm.py \ |
| 59 | + --model_name_or_path bert-base-uncased \ |
| 60 | + --dataset_name wikitext \ |
| 61 | + --dataset_config_name wikitext-2-raw-v1 \ |
| 62 | + --quantize \ |
| 63 | + --quantization_approach aware_training \ |
| 64 | + --prune \ |
| 65 | + --target_sparsity 0.1 \ |
| 66 | + --perf_tol 0.5 \ |
| 67 | + --do_train \ |
| 68 | + --do_eval \ |
| 69 | + --verify_loading \ |
| 70 | + --output_dir /tmp/mlm_output |
| 71 | +``` |
| 72 | + |
| 73 | +In order to apply dynamic, static or aware-training quantization, `quantization_approach` must be set to |
| 74 | +respectively `dynamic`, `static` or `aware_training`. |
| 75 | + |
| 76 | +The configuration file containing all the information related to the model quantization and pruning objectives can be |
| 77 | +specified using respectively `quantization_config` and `pruning_config`. If not specified, the default |
| 78 | +[quantization](https://github.com/huggingface/optimum/blob/main/examples/config/quantization.yml) |
| 79 | +and [pruning](https://github.com/huggingface/optimum/blob/main/examples/config/prune.yml) |
| 80 | +config files will be used. |
| 81 | + |
| 82 | +The flag `--verify_loading` can be passed along to verify that the resulting quantized model can be loaded correctly. |
0 commit comments