Skip to content

Commit 17a838a

Browse files
andrewjywangdrcege
andauthored
add 3dcnn serarch (#8)
* 3dcnn add 3dcnn search * add 3dcnn mutator super_res3d_k1dwk1_mutator.py * fix typo fix type in configs --------- Co-authored-by: Ce Ge <3213204+drcege@users.noreply.github.com>
1 parent 6a7fd19 commit 17a838a

29 files changed

+2844
-5
lines changed

README.md

+17-3
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,15 @@
1010

1111
- **:sunny: Hiring research interns for Neural Architecture Search, Tiny Machine Learning, Computer Vision tasks: [xiuyu.sxy@alibaba-inc.com](xiuyu.sxy@alibaba-inc.com)**
1212
- :boom: 2023.04: We will give a talk on Zero-Cost NAS at [**IFML Workshop**](https://www.ifml.institute/events/ifml-workshop-2023), April 20, 2023.
13+
- :boom: 2023.03: Code for [**E3D**](configs/action_recognition/README.md) is now released.
1314
- :boom: 2023.03: The code is refactoried and DeepMAD is supported.
1415
- :boom: 2023.03: [**DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network**](https://arxiv.org/abs/2303.02165) is accepted by CVPR'23.
1516
- :boom: 2023.02: A demo is available on [**ModelScope**](https://modelscope.cn/studios/damo/TinyNAS/summary)
1617
- :boom: 2023.01: [**Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition**](https://openreview.net/pdf?id=lj1Eb1OPeNw) is accepted by ICLR'23.
1718
- :boom: 2022.11: [**DAMO-YOLO**](https://github.com/tinyvision/DAMO-YOLO) backbone search is now supported! And paper is on [ArXiv](https://arxiv.org/abs/2211.15444) now.
18-
- :boom: 2022.09: [**Mixed-Precision Quantization Search**](scripts/quant/README.md) is now supported! The [**QE-Score**](https://openreview.net/pdf?id=E28hy5isRzC) paper is accepted by NeurIPS'22.
19+
- :boom: 2022.09: [**Mixed-Precision Quantization Search**](configs/quant/README.md) is now supported! The [**QE-Score**](https://openreview.net/pdf?id=E28hy5isRzC) paper is accepted by NeurIPS'22.
1920
- :boom: 2022.08: We will give a tutorial on [**Functional View for Zero-Shot NAS**](https://mlsys.org/virtual/2022/tutorial/2201) at MLSys'22.
20-
- :boom: 2022.06: Code for [**MAE-DET**](scripts/detection/README.md) is now released.
21+
- :boom: 2022.06: Code for [**MAE-DET**](configs/detection/README.md) is now released.
2122
- :boom: 2022.05: [**MAE-DET**](https://proceedings.mlr.press/v162/sun22c/sun22c.pdf) is accepted by ICML'22.
2223
- :boom: 2021.09: Code for [**Zen-NAS**](https://github.com/idstcv/ZenNAS) is now released.
2324
- :boom: 2021.07: The inspiring training-free paper [**Zen-NAS**](https://openaccess.thecvf.com/content/ICCV2021/papers/Lin_Zen-NAS_A_Zero-Shot_NAS_for_High-Performance_Image_Recognition_ICCV_2021_paper.pdf) has been accepted by ICCV'21.
@@ -33,7 +34,8 @@
3334
- [Budgets module](tinynas/budgets/README.md)
3435
- [Latency Module](tinynas/latency/op_profiler/README.md)
3536
- [Population module](tinynas/evolutions/README.md)
36-
It manages these modules with the help of [ModelScope](https://github.com/modelscope/modelscope) Registry and Configuration mechanism.
37+
38+
It manages these modules with the help of [ModelScope](https://github.com/modelscope/modelscope) Registry and Configuration mechanism.
3739

3840
- The `Searcher` is defined to be responsible for building and completing the entire search process. Through the combination of these modules and the corresponding configuration files, we can complete backbone search for different tasks (such as classification, detection, etc.) under different budget constraints (such as the number of parameters, FLOPs, delay, etc.).
3941

@@ -89,6 +91,18 @@
8991
| MAE-DET-M | 25.8 | 89.9 | 46.9 | 30.1 | 50.9 | 59.9 | [txt](configs/detection/models/maedet_m.txt) |[model](https://idstcv.oss-cn-zhangjiakou.aliyuncs.com/LightNAS/detection/maedet-m/latest.pth) |
9092
| MAE-DET-L | 43.9 | 152.9 | 47.8 | 30.3 | 51.9 | 61.1 | [txt](configs/detection/models/maedet_l.txt) |[model](https://idstcv.oss-cn-zhangjiakou.aliyuncs.com/LightNAS/detection/maedet-l/latest.pth) |
9193

94+
***
95+
## Results for Action Recognition ([Details](configs/action_recognition/README.md)
96+
97+
| Backbone | size | FLOPs (G) | SSV1 Top-1 | SSV1 Top-5 | Structure |
98+
|:---------:|:-------:|:-------:|:-------:|:-------:|:--------:|
99+
| X3D-S | 160 | 1.9 | 44.6 | 74.4| - |
100+
| X3D-S | 224 | 1.9 | 47.3 | 76.6| - |
101+
| E3D-S | 160 | 1.9 | 47.1 | 75.6| [txt](configs/action_recognition/models/E3D_S.txt) |
102+
| E3D-M | 224 | 4.7 | 49.4 | 78.1| [txt](configs/action_recognition/models/E3D_M.txt) |
103+
| E3D-L | 312 | 18.3 | 51.1 | 78.7| [txt](configs/action_recognition/models/E3D_L.txt) |
104+
105+
***
92106
**Note**
93107
If you find this useful, please support us by citing them.
94108
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Copyright (c) Alibaba, Inc. and its affiliates.
2+
# The implementation is also open-sourced by the authors, and available at
3+
# https://github.com/alibaba/lightweight-neural-architecture-search.
4+
5+
work_dir = './save_model/E3DM_FLOPs_185e8/'
6+
log_level = 'INFO' # INFO/DEBUG/ERROR
7+
log_freq = 1000
8+
9+
""" video config """
10+
image_size = 312
11+
frames = 16
12+
13+
""" Model config """
14+
model = dict(
15+
type = 'Cnn3DNet',
16+
structure_info = [
17+
{'class': 'Conv3DKXBNRELU', 'in': 3, 'out': 24, 's': 2, 'kt': 1, 'k': 3}, \
18+
{'class': 'SuperRes3DK1DWK1', 'in': 24, 'out': 24, 's': 2, 'kt': 1, 'k': 5, 'L': 1, 'btn': 48}, \
19+
{'class': 'SuperRes3DK1DWK1', 'in': 24, 'out': 48, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 96}, \
20+
{'class': 'SuperRes3DK1DWK1', 'in': 48, 'out': 96, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 192}, \
21+
{'class': 'SuperRes3DK1DWK1', 'in': 96, 'out': 96, 's': 1, 'kt': 3, 'k': 3, 'L': 1, 'btn': 192}, \
22+
{'class': 'SuperRes3DK1DWK1', 'in': 96, 'out': 192, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 384}, \
23+
{'class': 'Conv3DKXBNRELU', 'in': 192, 'out': 512, 's': 1, 'kt': 1, 'k': 1},\
24+
]
25+
)
26+
27+
""" Budget config """
28+
budgets = [
29+
dict(type = "flops", budget = 185e8),
30+
dict(type = "layers",budget = 167),
31+
]
32+
33+
""" Score config """
34+
score = dict(type = 'stentr', multi_block_ratio = [0,0,0,0,1], frames=16)
35+
36+
""" Space config """
37+
space = dict(
38+
type = 'space_3d_k1dwk1',
39+
image_size = image_size,
40+
)
41+
42+
""" Search config """
43+
search=dict(
44+
minor_mutation = False, # whether fix the stage layer
45+
minor_iter = 100000, # which iteration to enable minor_mutation
46+
popu_size = 256,
47+
num_random_nets = 100000, # the searching iterations
48+
sync_size_ratio = 1.0, # control each thread sync number: ratio * popu_size
49+
num_network = 1,
50+
)
51+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Copyright (c) Alibaba, Inc. and its affiliates.
2+
# The implementation is also open-sourced by the authors, and available at
3+
# https://github.com/alibaba/lightweight-neural-architecture-search.
4+
5+
work_dir = './save_model/E3DM_FLOPs_50e8/'
6+
log_level = 'INFO' # INFO/DEBUG/ERROR
7+
log_freq = 1000
8+
9+
""" video config """
10+
image_size = 224
11+
frames = 16
12+
13+
""" Model config """
14+
model = dict(
15+
type = 'Cnn3DNet',
16+
structure_info = [
17+
{'class': 'Conv3DKXBNRELU', 'in': 3, 'out': 24, 's': 2, 'kt': 1, 'k': 3}, \
18+
{'class': 'SuperRes3DK1DWK1', 'in': 24, 'out': 24, 's': 2, 'kt': 1, 'k': 5, 'L': 1, 'btn': 48}, \
19+
{'class': 'SuperRes3DK1DWK1', 'in': 24, 'out': 48, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 96}, \
20+
{'class': 'SuperRes3DK1DWK1', 'in': 48, 'out': 96, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 192}, \
21+
{'class': 'SuperRes3DK1DWK1', 'in': 96, 'out': 96, 's': 1, 'kt': 3, 'k': 3, 'L': 1, 'btn': 192}, \
22+
{'class': 'SuperRes3DK1DWK1', 'in': 96, 'out': 192, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 384}, \
23+
{'class': 'Conv3DKXBNRELU', 'in': 192, 'out': 512, 's': 1, 'kt': 1, 'k': 1},\
24+
]
25+
)
26+
27+
""" Budget config """
28+
budgets = [
29+
dict(type = "flops", budget = 50e8),
30+
dict(type = "layers",budget = 83),
31+
]
32+
33+
""" Score config """
34+
score = dict(type = 'stentr', multi_block_ratio = [0,0,0,0,1], frames=16)
35+
36+
""" Space config """
37+
space = dict(
38+
type = 'space_3d_k1dwk1',
39+
image_size = image_size,
40+
)
41+
42+
""" Search config """
43+
search=dict(
44+
minor_mutation = False, # whether fix the stage layer
45+
minor_iter = 100000, # which iteration to enable minor_mutation
46+
popu_size = 256,
47+
num_random_nets = 100000, # the searching iterations
48+
sync_size_ratio = 1.0, # control each thread sync number: ratio * popu_size
49+
num_network = 1,
50+
)
51+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Copyright (c) Alibaba, Inc. and its affiliates.
2+
# The implementation is also open-sourced by the authors, and available at
3+
# https://github.com/alibaba/lightweight-neural-architecture-search.
4+
5+
work_dir = './save_model/E3DS_FLOPs_20e8/'
6+
log_level = 'INFO' # INFO/DEBUG/ERROR
7+
log_freq = 1000
8+
9+
""" video config """
10+
image_size = 160
11+
frames = 13
12+
13+
""" Model config """
14+
model = dict(
15+
type = 'Cnn3DNet',
16+
structure_info = [
17+
{'class': 'Conv3DKXBNRELU', 'in': 3, 'out': 24, 's': 2, 'kt': 1, 'k': 3}, \
18+
{'class': 'SuperRes3DK1DWK1', 'in': 24, 'out': 24, 's': 2, 'kt': 1, 'k': 5, 'L': 1, 'btn': 48}, \
19+
{'class': 'SuperRes3DK1DWK1', 'in': 24, 'out': 48, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 96}, \
20+
{'class': 'SuperRes3DK1DWK1', 'in': 48, 'out': 96, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 192}, \
21+
{'class': 'SuperRes3DK1DWK1', 'in': 96, 'out': 96, 's': 1, 'kt': 3, 'k': 3, 'L': 1, 'btn': 192}, \
22+
{'class': 'SuperRes3DK1DWK1', 'in': 96, 'out': 192, 's': 2, 'kt': 3, 'k': 3, 'L': 1, 'btn': 384}, \
23+
{'class': 'Conv3DKXBNRELU', 'in': 192, 'out': 512, 's': 1, 'kt': 1, 'k': 1},\
24+
]
25+
)
26+
27+
""" Budget config """
28+
budgets = [
29+
dict(type = "flops", budget = 20e8),
30+
dict(type = "layers",budget = 83),
31+
]
32+
33+
""" Score config """
34+
score = dict(type = 'stentr', multi_block_ratio = [0,0,0,0,1], frames=13)
35+
36+
""" Space config """
37+
space = dict(
38+
type = 'space_3d_k1dwk1',
39+
image_size = image_size,
40+
)
41+
42+
""" Search config """
43+
search=dict(
44+
minor_mutation = False, # whether fix the stage layer
45+
minor_iter = 500000, # which iteration to enable minor_mutation
46+
popu_size = 256,
47+
num_random_nets = 500000, # the searching iterations
48+
sync_size_ratio = 1.0, # control each thread sync number: ratio * popu_size
49+
num_network = 1,
50+
)
51+

configs/action_recognition/README.md

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
## Abstract
2+
3+
* **Instruction**
4+
5+
We search efficient E3D backbones for action recognition and E3D-S/M/L are aligned with X3D-S/M/L. <br/>
6+
7+
8+
* **Use the searching configs for Classification**
9+
10+
```shell
11+
sh tools/dist_search.sh configs/E3D_X3DS_FLOPs.py
12+
```
13+
**`E3D_X3DS_FLOPs.py` is the config for searching X3DS-like model within the budget of FLOPs using STEntr Score.**
14+
15+
**`E3D_X3DM_FLOPs.py` is the config for searching X3DM-like model within the budget of FLOPs using STEntr Score.**
16+
17+
**`E3D_X3DL_FLOPs.py` is the config for searching X3DL-like model within the budget of FLOPs using STEntr Score.**
18+
19+
* **Use searched models in your own training pipeline**
20+
21+
**copy `tinynas/deploy/cnn3dnet` to your pipeline, then**
22+
```python
23+
from cnn3dnet import Cnn3DNet
24+
# for classifictaion
25+
model = Cnn3DNet(num_classes=classes,
26+
structure_txt=structure_txt,
27+
out_indices=(4,),
28+
classfication=True)
29+
30+
# if load with pretrained model
31+
model.init_weights(pretrained=pretrained_pth)
32+
```
33+
34+
***
35+
36+
## Results on Sth-Sth V1
37+
38+
| Backbone | size | FLOPs (G) | Top-1 | Top-5 | Structure |
39+
|:---------:|:-------:|:-------:|:-------:|:-------:|:--------:|
40+
| E3D-S | 160 | 1.9 | 47.1 | 75.6| [txt](models/E3D_S.txt) |
41+
| E3D-M | 224 | 4.7 | 49.4 | 78.1| [txt](models/E3D_M.txt) |
42+
| E3D-L | 312 | 18.3 | 51.1 | 78.7| [txt](models/E3D_L.txt) |
43+
44+
45+
46+
***
47+
## Citation
48+
49+
If you find this toolbox useful, please support us by citing this work as
50+
51+
```
52+
@inproceedings{iclr23maxste,
53+
title = {Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition},
54+
author = {Junyan Wang and Zhenhong Sun and Yichen Qian and Dong Gong and Xiuyu Sun and Ming Lin and Maurice Pagnucco and Yang Song },
55+
journal = {International Conference on Learning Representations},
56+
year = {2023},
57+
}
58+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
{ 'best_structures':[[ {'class': 'Conv3DKXBNRELU', 'in': 3, 'k': 3, 'kt': 1, 'out': 24, 's': 2},
2+
{ 'L': 3,
3+
'btn': 32,
4+
'class': 'SuperRes3DK1DWK1',
5+
'in': 24,
6+
'inner_class': 'Res3DK1DWK1',
7+
'k': 5,
8+
'kt': 1,
9+
'out': 24,
10+
's': 2},
11+
{ 'L': 13,
12+
'btn': 120,
13+
'class': 'SuperRes3DK1DWK1',
14+
'in': 24,
15+
'inner_class': 'Res3DK1DWK1',
16+
'k': 3,
17+
'kt': 3,
18+
'out': 48,
19+
's': 2},
20+
{ 'L': 13,
21+
'btn': 176,
22+
'class': 'SuperRes3DK1DWK1',
23+
'in': 48,
24+
'inner_class': 'Res3DK1DWK1',
25+
'k': 3,
26+
'kt': 3,
27+
'out': 120,
28+
's': 2},
29+
{ 'L': 13,
30+
'btn': 176,
31+
'class': 'SuperRes3DK1DWK1',
32+
'in': 120,
33+
'inner_class': 'Res3DK1DWK1',
34+
'k': 3,
35+
'kt': 3,
36+
'out': 120,
37+
's': 1},
38+
{ 'L': 13,
39+
'btn': 480,
40+
'class': 'SuperRes3DK1DWK1',
41+
'in': 120,
42+
'inner_class': 'Res3DK1DWK1',
43+
'k': 3,
44+
'kt': 3,
45+
'out': 192,
46+
's': 2},
47+
{'class': 'Conv3DKXBNRELU', 'in': 192, 'k': 1, 'kt': 1, 'out': 512, 's': 1}]],
48+
'space_arch': 'Cnn3DNet'}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
{'best_structures': [[
2+
{'class': 'Conv3DKXBNRELU', 'in': 3, 'k': 3, 'kt': 1, 'out': 24, 's': 2},
3+
{ 'L': 3,
4+
'btn': 32,
5+
'class': 'SuperRes3DK1DWK1',
6+
'in': 24,
7+
'inner_class': 'Res3DK1DWK1',
8+
'k': 5,
9+
'kt': 1,
10+
'out': 24,
11+
's': 2},
12+
{ 'L': 6,
13+
'btn': 96,
14+
'class': 'SuperRes3DK1DWK1',
15+
'in': 24,
16+
'inner_class': 'Res3DK1DWK1',
17+
'k': 3,
18+
'kt': 3,
19+
'out': 64,
20+
's': 2},
21+
{ 'L': 6,
22+
'btn': 176,
23+
'class': 'SuperRes3DK1DWK1',
24+
'in': 64,
25+
'inner_class': 'Res3DK1DWK1',
26+
'k': 3,
27+
'kt': 3,
28+
'out': 120,
29+
's': 2},
30+
{ 'L': 6,
31+
'btn': 176,
32+
'class': 'SuperRes3DK1DWK1',
33+
'in': 120,
34+
'inner_class': 'Res3DK1DWK1',
35+
'k': 3,
36+
'kt': 3,
37+
'out': 120,
38+
's': 1},
39+
{ 'L': 6,
40+
'btn': 464,
41+
'class': 'SuperRes3DK1DWK1',
42+
'in': 120,
43+
'inner_class': 'Res3DK1DWK1',
44+
'k': 3,
45+
'kt': 3,
46+
'out': 184,
47+
's': 2},
48+
{'class': 'Conv3DKXBNRELU', 'in': 184, 'k': 1, 'kt': 1, 'out': 512, 's': 1}]],
49+
'space_arch': 'Cnn3DNet'}

0 commit comments

Comments
 (0)