Skip to content

Latest commit

 

History

History
1297 lines (1271 loc) · 40.9 KB

benchmark.md

File metadata and controls

1297 lines (1271 loc) · 40.9 KB

Benchmark

Backends

CPU: ncnn, ONNXRuntime, OpenVINO

GPU: ncnn, TensorRT, PPLNN

Latency benchmark

Platform

  • Ubuntu 18.04
  • ncnn 20211208
  • Cuda 11.3
  • TensorRT 7.2.3.4
  • Docker 20.10.8
  • NVIDIA tesla T4 tensor core GPU for TensorRT.

Other settings

  • Static graph
  • Batch size 1
  • Synchronize devices after each inference.
  • We count the average inference performance of 100 images of the dataset.
  • Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters.
  • Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.

Users can directly test the speed through how_to_measure_performance_of_models.md. And here is the benchmark in our environment.

MMCls
MMCls TensorRT PPLNN NCNN
Model Dataset Input fp32 fp16 int8 fp16 SnapDragon888-fp32 Adreno660-fp32 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
ResNet ImageNet 1x3x224x224 2.97 336.90 1.26 791.89 1.21 829.66 1.30 768.28 33.91 29.49 25.93 38.57 $MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py
ResNeXt ImageNet 1x3x224x224 4.31 231.93 1.42 703.42 1.37 727.42 1.36 737.67 133.44 7.49 69.38 14.41 $MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet ImageNet 1x3x224x224 3.41 293.64 1.66 600.73 1.51 662.90 1.91 524.07 107.84 9.27 80.85 12.37 $MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py
ShuffleNetV2 ImageNet 1x3x224x224 1.37 727.94 1.19 841.36 1.13 883.47 4.69 213.33 9.55 104.71 10.66 93.81 $MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py
MMDet
MMDet TensorRT PPLNN
Model Dataset Input fp32 fp16 int8 fp16 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
YOLOv3 COCO 1x3x320x320 14.76 67.76 24.92 40.13 24.92 40.13 18.07 55.35 $MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD-Lite COCO 1x3x320x320 8.84 113.12 9.21 108.56 8.04 124.38 19.72 50.71 $MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
RetinaNet COCO 1x3x800x1344 97.09 10.30 25.79 38.78 16.88 59.23 38.34 26.08 $MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS COCO 1x3x800x1344 84.06 11.90 23.15 43.20 17.68 56.57 - - $MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF COCO 1x3x800x1344 82.96 12.05 21.02 47.58 13.50 74.08 30.41 32.89 $MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
Faster-RCNN COCO 1x3x800x1344 88.08 11.35 26.52 37.70 19.14 52.23 65.40 15.29 $MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Mask-RCNN COCO 1x3x800x1344 320.86 3.12 241.32 4.14 - - 86.80 11.52 $MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py
MMDet NCNN
Model Dataset Input SnapDragon888-fp32 Adreno660-fp32 model config file
latency (ms) FPS latency (ms) FPS
MobileNetv2-YOLOv3 COCO 1x3x320x320 48.57 20.59 66.55 15.03 $MMDET_DIR/configs/yolo/yolov3_mobilenetv2_mstrain-416_300e_coco.py
SSD-Lite COCO 1x3x320x320 44.91 22.27 66.19 15.11 $MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
MMEdit
MMEdit TensorRT PPLNN
Model Input fp32 fp16 int8 fp16 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
ESRGAN 1x3x32x32 12.64 79.14 12.42 80.50 12.45 80.35 7.67 130.39 $MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
SRCNN 1x3x32x32 0.70 1436.47 0.35 2836.62 0.26 3850.45 0.56 1775.11 $MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py
MMOCR
MMOCR TensorRT PPLNN NCNN
Model Dataset Input fp32 fp16 int8 fp16 SnapDragon888-fp32 Adreno660-fp32 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
DBNet ICDAR2015 1x3x640x640 10.70 93.43 5.62 177.78 5.00 199.85 34.84 28.70 - - - - $MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
CRNN IIIT5K 1x1x32x32 1.93 518.28 1.40 713.88 1.36 736.79 - - 10.57 94.64 20.00 50.00 $MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
MMSeg
MMSeg TensorRT PPLNN
Model Dataset Input fp32 fp16 int8 fp16 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
FCN Cityscapes 1x3x512x1024 128.42 7.79 23.97 41.72 18.13 55.15 27.00 37.04 $MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet Cityscapes 1x3x512x1024 119.77 8.35 24.10 41.49 16.33 61.23 27.26 36.69 $MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3 Cityscapes 1x3x512x1024 226.75 4.41 31.80 31.45 19.85 50.38 36.01 27.77 $MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3+ Cityscapes 1x3x512x1024 151.25 6.61 47.03 21.26 50.38 26.67 34.80 28.74 $MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py

Performance benchmark

Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.

MMCls
MMCls PyTorch ONNX Runtime TensorRT PPLNN
Model Task Metrics fp32 fp32 fp32 fp16 int8 fp16 model config file
ResNet-18 Classification top-1 69.90 69.88 69.88 69.86 69.86 69.86 $MMCLS_DIR/configs/resnet/resnet18_b32x8_imagenet.py
top-5 89.43 89.34 89.34 89.33 89.38 89.34
ResNeXt-50 Classification top-1 77.90 77.90 77.90 - 77.78 77.89 $MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
top-5 93.66 93.66 93.66 - 93.64 93.65
SE-ResNet-50 Classification top-1 77.74 77.74 77.74 77.75 77.63 77.73 $MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
top-5 93.84 93.84 93.84 93.83 93.72 93.84
ShuffleNetV1 1.0x Classification top-1 68.13 68.13 68.13 68.13 67.71 68.11 $MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py
top-5 87.81 87.81 87.81 87.81 87.58 87.80
ShuffleNetV2 1.0x Classification top-1 69.55 69.55 69.55 69.54 69.10 69.54 $MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py
top-5 88.92 88.92 88.92 88.91 88.58 88.92
MobileNet V2 Classification top-1 71.86 71.86 71.86 71.87 70.91 71.84 $MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py
top-5 90.42 90.42 90.42 90.40 89.85 90.41
MMDet
MMDet Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
Model Task Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 fp32 model config file
YOLOV3 Object Detection COCO2017 box AP 33.7 - 33.5 33.5 33.5 - - $MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD Object Detection COCO2017 box AP 25.5 - 25.5 25.5 - - - $MMDET_DIR/configs/ssd/ssd300_coco.py
RetinaNet Object Detection COCO2017 box AP 36.5 - 36.4 36.4 36.3 36.5 - $MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS Object Detection COCO2017 box AP 36.6 - 36.6 36.5 - - - $MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF Object Detection COCO2017 box AP 37.4 - 37.4 37.4 37.2 37.4 - $MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
YOLOX Object Detection COCO2017 box AP 40.5 - 40.3 40.3 29.3 - - $MMDET_DIR/configs/yolox/yolox_s_8x8_300e_coco.py
Faster R-CNN Object Detection COCO2017 box AP 37.4 - 37.3 37.3 37.1 37.3 - $MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
ATSS Object Detection COCO2017 box AP 39.4 - 39.4 39.4 - - - $MMDET_DIR/configs/atss/atss_r50_fpn_1x_coco.py
Cascade R-CNN Object Detection COCO2017 box AP 40.4 - 40.4 40.4 - 40.4 - $MMDET_DIR/configs/cascade_rcnn/cascade_rcnn_r50_caffe_fpn_1x_coco.py
Mask R-CNN Instance Segmentation COCO2017 box AP 38.2 - 38.1 38.1 - 38.0 - $MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py
mask AP 34.7 - 33.7 33.7 - - -
MMEdit
MMEdit Pytorch ONNX Runtime TensorRT PPLNN
Model Task Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 model config file
SRCNN Super Resolution Set5 PSNR 28.4316 28.4323 28.4323 28.4286 28.1995 28.4311 $MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py
SSIM 0.8099 0.8097 0.8097 0.8096 0.7934 0.8096
ESRGAN Super Resolution Set5 PSNR 28.2700 28.2592 28.2592 - - 28.2624 $MMEDIT_DIR/configs/restorers/esrgan/esrgan_x4c64b23g32_g1_400k_div2k.py
SSIM 0.7778 0.7764 0.7774 - - 0.7765
ESRGAN-PSNR Super Resolution Set5 PSNR 30.6428 30.6444 30.6430 - - 27.0426 $MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
0.8559 0.8558 0.8558 - - 0.8557
SRGAN Super Resolution Set5 PSNR 27.9499 27.9408 27.9408 - - 27.9388 $MMEDIT_DIR/configs/restorers/srresnet_srgan/srgan_x4c64b16_g1_1000k_div2k.pyy
SSIM 0.7846 0.7839 0.7839 - - 0.7839
SRResNet Super Resolution Set5 PSNR 30.2252 30.2300 30.2300 - - 30.2294 $MMEDIT_DIR/configs/restorers/srresnet_srgan/msrresnet_x4c64b16_g1_1000k_div2k.py
0.8491 0.8488 0.8488 - - 0.8488
Real-ESRNet Super Resolution Set5 PSNR 28.0297 27.7016 27.7016 - - 27.7049 $MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py
SSIM 0.8236 0.8122 0.8122 - - 0.8123
EDSR Super Resolution Set5 PSNR 30.2223 30.2214 30.2214 30.2211 30.1383 - $MMEDIT_DIR/configs/restorers/edsr/edsr_x4c64b16_g1_300k_div2k.py
SSIM 0.8500 0.8497 0.8497 0.8497 0.8469 -
MMOCR
MMOCR Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
Model Task Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 fp32 model config file
DBNet* TextDetection ICDAR2015 recall 0.7310 0.7304 0.7198 0.7179 0.7111 0.7304 0.7309 $MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
precision 0.8714 0.8718 0.8677 0.8674 0.8688 0.8718 0.8714
hmean 0.7950 0.7949 0.7868 0.7856 0.7821 0.7949 0.7950
CRNN TextRecognition IIIT5K acc 0.8067 0.8067 0.8067 0.8063 0.8067 0.8067 - $MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
SAR TextRecognition IIIT5K acc 0.9517 0.9287 - - - - - $MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py
MMSeg
MMSeg Pytorch ONNXRuntime TensorRT PPLNN
Model Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 model config file
FCN Cityscapes mIoU 72.25 - 72.36 72.35 74.19 - $MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet Cityscapes mIoU 78.55 - 78.26 78.24 77.97 - $MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
deeplabv3 Cityscapes mIoU 79.09 - 79.12 79.12 78.96 - $MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py
deeplabv3+ Cityscapes mIoU 79.61 - 79.6 79.6 79.43 - $MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py
Fast-SCNN Cityscapes mIoU 70.96 - 70.93 70.92 66.0 - $MMSEG_DIR/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py

Notes

  • As some datasets contain images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.

  • Some int8 performance benchmarks of TensorRT require Nvidia cards with tensor core, or the performance would drop heavily.

  • DBNet uses the interpolate mode nearest in the neck of the model, which TensorRT-7 applies a quite different strategy from Pytorch. To make the repository compatible with TensorRT-7, we rewrite the neck to use the interpolate mode bilinear which improves final detection performance. To get the matched performance with Pytorch, TensorRT-8+ is recommended, which the interpolate methods are all the same as Pytorch.

  • Mask AP of Mask R-CNN drops by 1% for the backend. The main reason is that the predicted masks are directly interpolated to original image in PyTorch, while they are at first interpolated to the preprocessed input image of the model and then to original image in other backends.