This document describes the step-by-step instructions for reproducing PyTorch ResNet50/ResNet18/ResNet101 tuning results with Intel® Neural Compressor.
Note
- PyTorch eager mode quantization implementation requires to manually add QuantStub and DequantStub for quantizable ops, it also requires to manually do fusion operation.
- Neural Compressor requires users to complete these two manual steps before triggering auto-tuning process. For details, please refer to https://pytorch.org/docs/stable/quantization.html
pip install -r requirements.txt
Download ImageNet Raw image to dir: /path/to/imagenet. The dir include below folder:
ls /path/to/imagenet
train val
This is a tutorial of how to enable a PyTorch classification model with Intel® Neural Compressor.
For quantization aware training mode, Intel® Neural Compressor supports four usage as below:
- User specifies fp32 "model", training function "q_func", evaluation dataset "eval_dataloader" and metric in tuning.metric field of model-specific yaml config file, this option does not require customer to implement evaluation function.
- User specifies fp32 "model", training function "q_func" and a custom "eval_func" which encapsulates the evaluation dataset and metric by itself, this option require customer implement evaluation function by himself.
- User specifies fp32 "model", "calibration_dataloader", "eval_dataloader", and metric, optimizer, criterion in model-specific yaml config file. Neural Compressor will construct buildin training function and evaluation function this option.
- User specifies fp32 "model", "calibration_dataloader", a custom "eval_func", and optimizer, criterion in model-specific yaml config file. Neural Compressor will only construct buildin evaluation function this option.
As ResNet18/50/101 series are typical classification models, use Top-K as metric which is built-in supported by Intel® Neural Compressor. So here we integrate PyTorch ResNet with Intel® Neural Compressor by the first or third use cases for simplicity.
model:
name: imagenet_qat
framework: pytorch
quantization: # optional. required for QAT and PTQ.
approach: quant_aware_training
train:
end_epoch: 8
optimizer:
SGD:
learning_rate: 0.0001
criterion:
CrossEntropyLoss:
reduction: mean
evaluation:
accuracy:
metric:
topk: 1
tuning:
accuracy_criterion:
relative: 0.01
exit_policy:
timeout: 0
random_seed: 9527
Here we choose built-in optimizer, criterion, metric and set accuracy target as tolerating 0.01 relative accuracy loss of baseline. The default tuning strategy is basic strategy. The timeout 0 means unlimited tuning time until accuracy target is met, but the result maybe is not a model of best accuracy and performance.
PyTorch quantization requires two manual steps:
- Add QuantStub and DeQuantStub for all quantizable ops.
- Fuse possible patterns, such as Conv + Relu and Conv + BN + Relu.
Torchvision provide quantized_model, so we didn't do these steps above for all torchvision models. Please refer torchvision
The related code please refer to examples/pytorch/eager/image_recognition/imagenet/cpu/qat/main_buildin.py.
After prepare step is done, we just need update main.py like below.
model.module.fuse_model()
from neural_compressor.experimental import Quantization, common
quantizer = Quantization(args.config)
quantizer.model = common.Model(model)
quantizer.calib_dataloader = train_loader
quantizer.eval_dataloader = val_loader
q_model = quantizer()
q_model.save(args.tuned_checkpoint)
The quantizer() function will return a best quantized model during timeout constrain.
In examples directory, there is a template.yaml. We could remove most of the items and only keep mandatory item for tuning.
#conf.yaml
model:
name: imagenet_qat
framework: pytorch
quantization:
approach: quant_aware_training
evaluation:
accuracy:
metric:
topk: 1
tuning:
accuracy_criterion:
relative: 0.01
exit_policy:
timeout: 0
random_seed: 9527
Here we choose topk built-in metric and set accuracy target as tolerating 0.01 relative accuracy loss of baseline. The default tuning strategy is basic strategy. The timeout 0 means unlimited tuning time until accuracy target is met, but the result maybe is not a model of best accuracy and performance.
PyTorch quantization requires two manual steps:
- Add QuantStub and DeQuantStub for all quantizable ops.
- Fuse possible patterns, such as Conv + Relu and Conv + BN + Relu.
Torchvision provide quantized_model, so we didn't do these steps above for all torchvision models. Please refer torchvision
The related code please refer to examples/pytorch/eager/image_recognition/imagenet/cpu/qat/main.py.
After prepare step is done, we just need update main.py like below.
def training_func_for_nc(model):
epochs = 8
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
prev_loss = 100
loss_increase_times = 0
patience = 2
for nepoch in range(epochs):
model.train()
cnt = 0
for image, target in train_loader:
print('.', end='')
cnt += 1
output = model(image)
loss = criterion(output, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if cnt % 10 == 1 or cnt == len(train_loader) + 1:
print('[{}/{}, {}/{}] Loss : {:.8}'.format(
nepoch+1, epochs, cnt, len(train_loader), loss.item()))
if cnt % 10 == 1 or cnt == len(train_loader) + 1:
_, curr_loss = validate(val_loader_earlystop, model, criterion, args)
print("The current val loss: ", curr_loss)
if curr_loss > prev_loss:
loss_increase_times += 1
print('No improvement times: ', loss_increase_times)
if loss_increase_times >= patience:
print("Early stopping")
return
prev_loss = curr_loss
if nepoch > 3:
# Freeze quantizer parameters
model.apply(torch.quantization.disable_observer)
if nepoch > 2:
# Freeze batch norm mean and variance estimates
model.apply(torch.nn.intrinsic.qat.freeze_bn_stats)
return
model.module.fuse_model()
from neural_compressor.experimental import Quantization, common
quantizer = Quantization("./conf.yaml")
quantizer.model = common.Model(model)
quantizer.q_func = training_func_for_nc
quantizer.eval_dataloader = val_loader
q_model = quantizer()
The quantizer() function will return a best quantized model during timeout constrain.
with buildin training function
cd examples/pytorch/eager/image_recognition/imagenet/cpu/qat
python main_buildin.py -t -a resnet50 --pretrained --config ./conf_buildin.yaml /path/to/imagenet
without buildin training function
cd examples/pytorch/eager/image_recognition/imagenet/cpu/qat
python main.py -t -a resnet50 --pretrained --config /path/to/config_file /path/to/imagenet
For ResNet50 model, we can get int8 0.7614 accuracy vs fp32 0.7613.
with buildin training function
cd examples/pytorch/eager/image_recognition/imagenet/cpu/qat
python main_buildin.py -t -a resnet18 --pretrained --config ./conf_buildin.yaml /path/to/imagenet
without buildin training functionet
cd examples/pytorch/eager/image_recognition/imagenet/cpu/qat
python main.py -t -a resnet18 --pretrained --config /path/to/config_file /path/to/imagenet
with buildin training function
cd examples/pytorch/eager/image_recognition/imagenet/cpu/qat
python main_buildin.py -t -a resnext101_32x8d --pretrained --config ./conf_buildin.yaml /path/to/imagenet
without buildin training functionet
cd examples/pytorch/eager/image_recognition/imagenet/cpu/qat
python main.py -t -a resnext101_32x8d --pretrained --config /path/to/config_file /path/to/imagenet
without buildin training functionet
cd examples/pytorch/eager/image_recognition/imagenet/cpu/qat
python main.py -t -a mobilenet_v2 --pretrained --config /path/to/config_file /path/to/imagenet