Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
port_weights		port_weights
tests		tests
README.md		README.md
__init__.py		__init__.py
augment.py		augment.py
config.py		config.py
datasets.py		datasets.py
deit.png		deit.png
deit.py		deit.py
droppath.py		droppath.py
losses.py		losses.py
main_eval_regnet_multi_gpu.py		main_eval_regnet_multi_gpu.py
main_multi_gpu.py		main_multi_gpu.py
main_single_gpu.py		main_single_gpu.py
mixup.py		mixup.py
model_ema.py		model_ema.py
nohup.out		nohup.out
random_erasing.py		random_erasing.py
regnet.py		regnet.py
run_eval.sh		run_eval.sh
run_eval_multi.sh		run_eval_multi.sh
run_eval_multi_224.sh		run_eval_multi_224.sh
run_eval_multi_384.sh		run_eval_multi_384.sh
run_eval_regnet.sh		run_eval_regnet.sh
run_train.sh		run_train.sh
run_train_multi.sh		run_train_multi.sh
run_train_multi_tiny.sh		run_train_multi_tiny.sh
stats.py		stats.py
utils.py		utils.py

README.md

Training data-efficient image transformers & distillation through attention, arxiv

PaddlePaddle training/validation code and pretrained models for DeiT.

The official pytorch implementation is here.

This implementation is developed by PaddleViT.

DeiT Model Overview

Update

Update (2021-09-27): More weights are uploaded.
Update (2021-08-11): Code is released and ported weights are uploaded.

Models Zoo

Model	Acc@1	Acc@5	#Params	FLOPs	Image Size	Crop_pct	Interpolation	Link
deit_tiny_distilled_224	74.52	91.90	5.9M	1.1G	224	0.875	bicubic	google/baidu(rhda)
deit_small_distilled_224	81.17	95.41	22.4M	4.3G	224	0.875	bicubic	google/baidu(pv28)
deit_base_distilled_224	83.32	96.49	87.2M	17.0G	224	0.875	bicubic	google/baidu(5f2g)
deit_base_distilled_384	85.43	97.33	87.2M	49.9G	384	1.0	bicubic	google/baidu(qgj2)

Teacher Model	Link
RegNet_Y_160	google/baidu(gjsm)

*The results are evaluated on ImageNet2012 validation set.

Notebooks

We provide a few notebooks in aistudio to help you get started:

*(coming soon)*

Requirements

Python>=3.6
yaml>=0.2.5
PaddlePaddle>=2.1.0
yacs>=0.1.8

Data

ImageNet2012 dataset is used in the following folder structure:

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Usage

To use the model with pretrained weights, download the .pdparam weight file and change related file paths in the following python scripts. The model config files are located in ./configs/.

For example, assume the downloaded weight file is stored in ./deit_base_patch16_224.pdparams, to use the deit_base_patch16_224 model in python:

from config import get_config
from deit import build_deit as build_model
# config files in ./configs/
config = get_config('./configs/deit_base_patch16_224.yaml')
# build model
model = build_model(config)
# load pretrained weights
model_state_dict = paddle.load('./deit_base_patch16_224.pdparams')
model.set_dict(model_state_dict)

Evaluation

To evaluate DeiT model performance on ImageNet2012 with a single GPU, run the following script using command line:

sh run_eval.sh

or

CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
    -cfg=./configs/deit_base_patch16_224.yaml \
    -dataset=imagenet2012 \
    -batch_size=16 \
    -data_path=/path/to/imagenet/val/dataset/val \
    -eval \
    -pretrained=/path/to/pretrained/model/deit_base_patch16_224  # .pdparams is NOT needed

Run evaluation using multi-GPUs:

sh run_eval_multi.sh

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
    -cfg=./configs/deit_base_patch16_224.yaml \
    -dataset=imagenet2012 \
    -batch_size=16 \
    -data_path=/path/to/dataset/imagenet/val \
    -eval \
    -pretrained=/path/to/pretrained/model/deit_base_patch16_224  # .pdparams is NOT needed

Training

To train the DeiT Transformer model on ImageNet2012 with single GPU, download the pretrained weights of teacher model (regnety_160.pdparams) and run the following script using command line:

sh run_train_single.sh

or

CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
  -cfg=./configs/deit_base_patch16_224.yaml \
  -dataset=imagenet2012 \
  -batch_size=32 \
  -data_path=/path/to/dataset/imagenet/train \
  -teacher_model=/path/to/pretrained/model/regnety_160  # .pdparams is NOT needed

Run training using multi-GPUs:

sh run_train_multi.sh

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
    -cfg=./configs/deit_base_patch16_224.yaml \
    -dataset=imagenet2012 \
    -batch_size=16 \
    -data_path=/path/to/dataset/imagenet/train \
    -teacher_model=/path/to/pretrained/model/regnety_160  # .pdparams is NOT needed

Visualization Attention Map

(coming soon)

Reference

@inproceedings{touvron2021training,
  title={Training data-efficient image transformers \& distillation through attention},
  author={Touvron, Hugo and Cord, Matthieu and Douze, Matthijs and Massa, Francisco and Sablayrolles, Alexandre and J{\'e}gou, Herv{\'e}},
  booktitle={International Conference on Machine Learning},
  pages={10347--10357},
  year={2021},
  organization={PMLR}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeiT

DeiT

README.md

Training data-efficient image transformers & distillation through attention, arxiv

DeiT Model Overview

Update

Models Zoo

Notebooks

Requirements

Data

Usage

Evaluation

Training

Visualization Attention Map

Reference

Files

DeiT

Directory actions

More options

Directory actions

More options

Latest commit

History

DeiT

Folders and files

parent directory

README.md

Training data-efficient image transformers & distillation through attention, arxiv

DeiT Model Overview

Update

Models Zoo

Notebooks

Requirements

Data

Usage

Evaluation

Training

Visualization Attention Map

Reference