Skip to content

Commit 4373fa4

Browse files
committed
finish updating gpt2- --> mistral-
1 parent f8bb469 commit 4373fa4

18 files changed

+31
-110
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Environments and non-Python dependencies can be managed with conda, and Python d
4040

4141
#### Prerequisites
4242

43-
First, make sure to update `conf/tutorial-gpt2-micro.yaml` with the directories you want to store the Hugging Face
43+
First, make sure to update `conf/mistral-micro.yaml` with the directories you want to store the Hugging Face
4444
cache and model runs.
4545

4646
```
@@ -59,7 +59,7 @@ For single-node single-gpu training, run:
5959
```bash
6060
conda activate mistral
6161
cd mistral
62-
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro
62+
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/mistral-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro
6363
```
6464

6565
#### Multi-node multi-GPU training with DeepSpeed

conf/archive/v1/gpt2-debug-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
inherit:
77
- datasets/openwebtext.yaml
88
- models/gpt2-small.yaml
9-
- trainers/gpt2-small-short.yaml
9+
- trainers/debug.yaml
1010

1111
# Run ID -- defaults to `null`; override as you like!
1212
run_id: null

conf/mistral-medium.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# gpt2-mistral-small-config.yaml
2-
# Full Mistral GPT-2 Small Training Config, currently working with the OpenWebText Dataset, GPT-2 Small Architecture,
1+
# mistral-medium-config.yaml
2+
# Full Mistral GPT-2 Medium Training Config, currently working with the OpenWebText Dataset, GPT-2 Small Architecture,
33
# and full batch size (512). Runs with DeepSpeed ZeRO-2, with a per-device BSZ of 16.
44
#
55
# Inheritance and core paths can all be overridden from the command line or by re-writing these files.

conf/mistral-micro.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# tutorial-gpt2-micro.yaml
1+
# mistral2-micro.yaml
22
# Demo GPT-2 Micro Training Config, currently working with the WikiText103 Dataset, GPT-2 Micro Architecture,
33
# and batch size of 2. Runs with DeepSpeed ZeRO-2, with a per-device BSZ of 2.
44
#

conf/mistral-small.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# gpt2-mistral-small-config.yaml
1+
# mistral-small-config.yaml
22
# Full Mistral GPT-2 Small Training Config, currently working with the OpenWebText Dataset, GPT-2 Small Architecture,
33
# and full batch size (512). Runs with DeepSpeed ZeRO-2, with a per-device BSZ of 16.
44
#

conf/models/mistral-medium.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# gpt2-medium-config.yaml
1+
# mistral-medium-config.yaml
22
# Configuration for the GPT-2 Medium Model.
33
---
44
model:

conf/models/mistral-micro.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# gpt2-micro-config.yaml
1+
# mistral-micro-config.yaml
22
# Configuration for the GPT-2 Micro Model.
33
---
44
model:

conf/models/mistral-small.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# gpt2-small-config.yaml
1+
# mistral-small.yaml
22
# Configuration for the GPT-2 Small Model.
33
---
44
model:

conf/trainers/gpt2-medium.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# gpt2-small.yaml
1+
# gpt2-medium.yaml
22
# Trainer config for Full GPT-2 Medium, with the full fixed batch size of 512 (with gradient accumulation).
33
# This contract exactly follows that of HF.TrainingArguments so we can pass as a simple **kwargs -- make sure this
44
# continues to stay valid!

conf/tutorial-shakespeare-gpt2-micro.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# Inherit Dataset, Tokenization, Model, and Training Details
88
inherit:
99
- datasets/shakespeare.yaml
10-
- models/gpt2-micro.yaml
10+
- models/mistral-micro.yaml
1111
- trainers/gpt2-small-short.yaml
1212

1313
# Run ID -- make sure to override!

docs/getting_started/config.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -9,20 +9,20 @@ Configurations are specified using the `Quinine <https://github.com/krandiash/qu
99
Quinine allows users to integrate multiple config files and layer configs on top of each other.
1010
It is designed for machine learning projects with large sets of nested hyperparameters.
1111

12-
The easiest way to understand Quinine is to study ``conf/tutorial-gpt2-micro.yaml`` which is presented below.
12+
The easiest way to understand Quinine is to study ``conf/mistral-micro.yaml`` which is presented below.
1313

1414
This config specifies a variety of settings, and draws configurations from ``conf/datasets/wikitext103.yaml``,
15-
``conf/models/gpt2-micro.yaml`` and ``conf/trainers/gpt2-small.yaml``. This allows for clean separation of the
15+
``conf/models/mistral-micro.yaml`` and ``conf/trainers/gpt2-small.yaml``. This allows for clean separation of the
1616
configs for the dataset (e.g. name or number of pre-processing workers), the model (e.g. number of layers),
1717
and the trainer (e.g. learning rate), while high level configs are specified in the main config file.
1818

19-
Most of the defaults in ``conf/tutorial-gpt2-micro.yaml`` will work, but you will need to change
19+
Most of the defaults in ``conf/mistral-micro.yaml`` will work, but you will need to change
2020
the Weights & Biases settings and specify the artifacts directories ``cache_dir`` and ``run_dir``.
2121

22-
Example config: tutorial-gpt2-micro.yaml
22+
Example config: mistral-micro.yaml
2323
----------------------------------------
2424

25-
``conf/tutorial-gpt2-micro.yaml`` is a basic configuration file that can be used for an introductory training run
25+
``conf/mistral-micro.yaml`` is a basic configuration file that can be used for an introductory training run
2626

27-
.. include:: ../../conf/tutorial-gpt2-micro.yaml
27+
.. include:: ../../conf/mistral-micro.yaml
2828
:literal:

docs/getting_started/evaluate.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ To run evaluation, use this command: ::
1111

1212
cd mistral
1313
conda activate mistral
14-
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --model.initial_weights /path/to/runs/my-run/checkpoint-400000 --run_training False
14+
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/mistral-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --model.initial_weights /path/to/runs/my-run/checkpoint-400000 --run_training False
1515

1616
This will skip the training process and run a final evaluation, initializing from the weights of the checkpoint.
1717

18-
To evaluate a particular model, you need to supply the same config that was used to train the model (e.g. ``conf/tutorial-gpt2-micro.yaml``) in this example.
18+
To evaluate a particular model, you need to supply the same config that was used to train the model (e.g. ``conf/mistral-micro.yaml``) in this example.
1919

2020
Example Output
2121
--------------

docs/getting_started/train.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@ Training "Hello World"
55
----------------------
66

77
You should now be ready to launch a demo training run. There are example
8-
configurations for training on WikiText-103 in ``conf/tutorial-gpt2-micro.yaml``. You
8+
configurations for training on WikiText-103 in ``conf/mistral-micro.yaml``. You
99
will need to update the artifacts directories and the wandb settings in this file before
1010
running training.
1111

1212
To launch a training run, use this command (found in ``scripts/run/single-node.sh``) ::
1313

1414
cd mistral
1515
conda activate mistral
16-
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2
16+
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/mistral-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2
1717

1818
You may need to adjust your batch size depending on the available GPU memory.
1919

docs/tutorials/gcp_plus_kubernetes.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ The demo script ``gcp/run-demo-job.sh`` simply launches training with DeepSpeed:
199199
.. include:: ../../gcp/run-demo-job.sh
200200
:literal:
201201

202-
Make sure to update ``conf/tutorial-gpt2-micro.yaml`` to include your project specific values for Weights & Biases
202+
Make sure to update ``conf/mistral-micro.yaml`` to include your project specific values for Weights & Biases
203203
and the directories to store the cache and models, as described in the :doc:`Configuration section<../getting_started/config>`.
204204

205205
You can learn more about DeepSpeed training in the :doc:`DeepSpeed tutorial<deepspeed>`.

docs/tutorials/generate.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Run run_generation.py With Your Model
1818
-------------------------------------
1919

2020
As your model training runs, it should save checkpoints with all of the model resources in the directory
21-
you specified with ``articfacts.run_dir`` in the ``conf/tutorial-gpt2-micro.yaml`` config file.
21+
you specified with ``artifacts.run_dir`` in the ``conf/mistral-micro.yaml`` config file.
2222

2323
For this example, lets assume you have saved the checkpoints in ``/home/tutorial-gpt2-micro/runs/run-1``. If you trained
2424
for 400000 steps, you should have a corresponding checkpoint at ``/home/tutorial-gpt2-micro/runs/run-1/checkpoint-400000``.

tests/conf/train-diff.yaml

+1-34
Original file line numberDiff line numberDiff line change
@@ -6,29 +6,14 @@
66
---
77
# Inherit Dataset, Tokenization, Model, and Training Details
88
inherit:
9-
- datasets/wikitext2-detokenized.yaml
10-
- models/gpt2-micro.yaml
9+
- train.yaml
1110
- trainers/gpt2-small-diff.yaml
1211

13-
# Run ID -- make sure to override!
14-
run_id: null
15-
16-
# Weights & Biases
17-
wandb: hello-world
18-
group: gpt2-small
19-
2012
# Artifacts & Caching
2113
artifacts:
2214
cache_dir: /nlp/scr/jebolton/mistral-hello-world/artifacts
2315
run_dir: /nlp/scr/jebolton/mistral-hello-world/runs
2416

25-
# Save Effective Batch Size for Easy Handling ==> Main Code asserts infra + training_config results in this!
26-
effective_bsz: 16
27-
28-
# Resume from Checkpoint
29-
resume: false
30-
resume_checkpoint: null
31-
3217
# List of frequencies at which to save checkpoints, provided as a list of two-element tuples:
3318
# - Frequency (`freq`) at which to save checkpoints (# steps)
3419
# - Bound (`until`) on global step for given frequency (checkpoint every `freq` steps until global step = `until`)
@@ -38,26 +23,8 @@ checkpoint_frequency:
3823
- [100, 20000]
3924
- [1000, 400000]
4025

41-
# `torch.distributed` Default Infra Parameters -- to be overwritten by call to `torch.distributed.launch`
42-
local_rank: -1
43-
nnodes: -1
44-
nproc_per_node: -1
45-
46-
# DeepSpeed Default Infra Parameters -- to be overwritten by call to `DeepSpeed`
47-
num_gpus: -1
48-
num_nodes: -1
49-
world_size: -1
50-
51-
# Logging Parameters -- 10 = DEBUG, 20 = INFO, 30 = WARNING, 40 = ERROR, 50 = CRITICAL
52-
log_level: 20
53-
5426
# Random Seed
5527
seed: 40
5628

57-
online_eval:
58-
do_wikitext: false
59-
do_lambada: false
60-
stride: 256
61-
6229
run_training: false
6330
run_final_eval: false
+5-51
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,21 @@
1-
# gpt2-small.yaml
1+
# gpt2-small-diff.yaml
22
# Trainer config for Full GPT-2 Small, with the full fixed batch size of 512 (with gradient accumulation).
33
# This contract exactly follows that of HF.TrainingArguments so we can pass as a simple **kwargs -- make sure this
44
# continues to stay valid!
55
# Reference: https://huggingface.co/transformers/main_classes/trainer.html#trainingarguments
66
---
7-
training_arguments:
8-
# Overwrite from Top-Level Config
9-
output_dir: null
10-
11-
# Generally sticks to order from HF.TrainingArguments() Docs, skipping over sane defaults/implicitly set args...
12-
do_train: true
13-
evaluation_strategy: steps
14-
15-
# Set these based on GPU RAM/your available hardware
16-
per_device_train_batch_size: 8
17-
per_device_eval_batch_size: 16
18-
19-
# We set this dynamically based on DDP Computation [steps = effective_batch / (per_gpu_batch * gpus * nodes)]
20-
gradient_accumulation_steps: null
217

22-
# For Online Evaluation, only keep around the Losses
23-
prediction_loss_only: true
8+
inherit:
9+
- gpt2-small.yaml
2410

11+
training_arguments:
2512
# Learning Rate & Optimization Parameters, assumes AdamW
26-
learning_rate: 0.0006
2713
weight_decay: 0.2
2814
adam_beta1: 0.7
2915
adam_beta2: 0.3
30-
adam_epsilon: 1.0e-8
3116

3217
# Gradient Norm
3318
max_grad_norm: 2.0
3419

3520
# Maximum Training Steps (Overrides epochs!)
36-
max_steps: 100000
37-
38-
# LR Scheduling Parameters -- Warmup Steps should be 1% of total steps (Could use ratio)
39-
lr_scheduler_type: linear # Cosine not supported if we want to use DeepSpeed Optimizers (gets overwritten!)
40-
warmup_steps: 4000
41-
42-
# Logging Parameters -- Logging Directory (Tensorboard - is this necessary?) should be Overwritten at Runtime!
43-
run_name: null
44-
logging_dir: null
45-
logging_first_step: true
46-
logging_steps: 50
47-
48-
# Saving and Evaluation Steps
49-
eval_steps: 1000
50-
save_steps: 1000
51-
52-
# Resume Behavior --> ignore "full determinism" on resume (saves time for debugging)
53-
ignore_data_skip: false
54-
55-
# Seeds -- Should be Overwritten at Runtime!
56-
seed: null
57-
58-
### Optimization -- Precision, DeepSpeed, and FairScale Parameters -- all off for `simple` config
59-
fp16: true
60-
sharded_ddp: null
61-
deepspeed: null
62-
63-
# Dataloader Parallelism
64-
dataloader_num_workers: 0
65-
66-
# Should be overwritten from the Top-Level Config or CLI!
67-
local_rank: null
21+
max_steps: 100000

tutorials/custom-dataset/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ typically done at the top in the inherit section. For example,
4545
# Inherit Dataset, Tokenization, Model, and Training Details
4646
inherit:
4747
- datasets/pubmed_local.yaml
48-
- models/gpt2-small.yaml
48+
- models/mistral-small.yaml
4949
- trainers/gpt2-small.yaml
5050
```
5151

0 commit comments

Comments
 (0)