RRAM-based Inference with Pytorch

Outline

Low precision model training
- 4-bit VGG7 CIFAR-10 example
Post-training RRAM inference
- NeuroSim-based tool with faster inference speed and low-precision model.
- Low-precision weight decomposition.

Introduction

Training

The model was quantized based on PACT and DoreFa quantization algorithms. For detailed implementation, please check here.

The model Qconv2d consists both Input Feature Map quantization modules AQ and layer-wise weight quantization modules WQ. The precisions can be specified from the initialization stage of the model in train.py file:

model_cfg.kwargs.update({"num_classes": num_classes, "wbit": args.wbit, "abit":args.abit, "alpha_init": args.alpha_init})

The initial commit provided an 4-bit VGG7 model training example. The weight precision and the activation precision can be specified inside the vgg_cifar_quant.sh.

Important Note: Before running, please change the default Python path to your own path.

Inference with the pretrained model

The basic mapping scheme was inherited from the original 8-bit NeuroSim V1.2 inference code. The current implementation fully supports the CUDA computation and the updated version of Pytorch(1.7.0).

The quantization modules WQ and AQ are embedded inside the inference layer Qcon2dDoreFa. For more details, please check here.

The initial commit provided a trained 4-bit VGG7 model example. Before running the inference, please download the pre-trained 4-bit model from the following link:

https://drive.google.com/file/d/1TqV1pSbkRJcWWLAiM-vkPj4bCLXA4XzM/view?usp=sharing

To run the inference, execute vgg_cifar_eval.sh in your terminal. You can specify the precision of each cell and the ADC precisions inside the script.

Low-precision weight decomposition

The Qcon2dDoreFa module will first decompose the pre-trained low precision weight into specific bit-counts before the bit-by-bit processing, from LSB to MSB. For instance, given the 4-bit weight W with size 128 x 128 x 3 x 3, after the decomposition, the weight tensor will be extended to a 4 x 128 x 128 x 3 x 3 and saved as an external .pt file under /prob/. Each 1 x 128 x 128 x 3 x 3 corresponding the different bit-levels, from LSB to MSB. The following table summarizes the inference accuracy with different cell precisions:

VGG7: W4/A4	ADC Precision	Inference Acc.
SW baseline	N/A	92.12%
1-bit cell	6-bit	91.92%
2-bit cell	6-bit	91.63%

Example: 4-bit Weight decomposition with 1-bit cell.

print(list(weight_int.size()))
[128, 128, 3, 3]

# 4-bit integer weight
print(weight_int[15,15,:,:].cpu().numpy())
[[7. 8. 8.]
 [8. 8. 8.]
 [8. 7. 7.]]
# After decomposition
print(list(bit_levels.size()))
[4, 128, 128, 3, 3]

print(bit_levels[0,15,15,:,:].cpu().numpy())
[[1. 0. 0.]
 [0. 0. 0.]
 [0. 1. 1.]]
print(bit_levels[1,15,15,:,:].cpu().numpy())
[[1. 0. 0.]
 [0. 0. 0.]
 [0. 1. 1.]]
print(bit_levels[2,15,15,:,:].cpu().numpy())
[[1. 0. 0.]
 [0. 0. 0.]
 [0. 1. 1.]]
print(bit_levels[3,15,15,:,:].cpu().numpy())
[[0. 1. 1.]
 [1. 1. 1.]
 [1. 0. 0.]]

Reference

PACT: Parameterized Clipping Activation for Quantized Neural Networks

@article{choi2018pact,
  title={Pact: Parameterized clipping activation for quantized neural networks},
  author={Choi, Jungwook and Wang, Zhuo and Venkataramani, Swagath and Chuang, Pierce I-Jen and Srinivasan, Vijayalakshmi and Gopalakrishnan, Kailash},
  journal={arXiv preprint arXiv:1805.06085},
  year={2018}
}

Dorefa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients

@article{zhou2016dorefa,
  title={Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients},
  author={Zhou, Shuchang and Wu, Yuxin and Ni, Zekun and Zhou, Xinyu and Wen, He and Zou, Yuheng},
  journal={arXiv preprint arXiv:1606.06160},
  year={2016}
}

DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies

@inproceedings{peng2019dnn+,
  title={DNN+ NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies},
  author={Peng, Xiaochen and Huang, Shanshi and Luo, Yandong and Sun, Xiaoyu and Yu, Shimeng},
  booktitle={2019 IEEE International Electron Devices Meeting (IEDM)},
  pages={32--5},
  year={2019},
  organization={IEEE}
}

Name	Name	Last commit message	Last commit date
Latest commit mengjian0502 add resnet18 Nov 16, 2021 3fb556d · Nov 16, 2021 History 14 Commits
.ipynb_checkpoints	.ipynb_checkpoints	modified distribution	Oct 25, 2021
models	models	add resnet18	Nov 16, 2021
save/vgg7_quant_eval/vgg7_quant_eval_w4_a4_mode_dorefa	save/vgg7_quant_eval/vgg7_quant_eval_w4_a4_mode_dorefa	modified distribution	Oct 25, 2021
.gitignore	.gitignore	moodified	Sep 27, 2021
LICENSE	LICENSE	Initial commit	Mar 10, 2021
README.md	README.md	modified: README.md	Mar 10, 2021
distribution.ipynb	distribution.ipynb	modified distribution	Oct 25, 2021
inference.py	inference.py	modified distribution	Oct 25, 2021
resnet_cifar_eval.sh	resnet_cifar_eval.sh	add resnet18	Nov 16, 2021
resnet_cifar_quant.sh	resnet_cifar_quant.sh	bug fixed	Nov 15, 2021
train.py	train.py	bug fixed	Nov 15, 2021
utils.py	utils.py	add resnet18	Nov 16, 2021
vgg_cifar_eval.sh	vgg_cifar_eval.sh	bug fixed	Nov 15, 2021
vgg_cifar_quant.sh	vgg_cifar_quant.sh	modified distribution	Oct 25, 2021
weight_dist.py	weight_dist.py	bug fixed	Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RRAM-based Inference with Pytorch

Outline

Introduction

Training

Inference with the pretrained model

Low-precision weight decomposition

Reference

About

Releases

Packages

Languages

License

mengjian0502/TorchInference_RRAM

Folders and files

Latest commit

History

Repository files navigation

RRAM-based Inference with Pytorch

Outline

Introduction

Training

Inference with the pretrained model

Low-precision weight decomposition

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages