Benchmark

Backends

CPU: ncnn, ONNXRuntime, OpenVINO

GPU: ncnn, TensorRT, PPLNN

Latency benchmark

Platform

Ubuntu 18.04
ncnn 20211208
Cuda 11.3
TensorRT 7.2.3.4
Docker 20.10.8
NVIDIA tesla T4 tensor core GPU for TensorRT.

Other settings

Static graph
Batch size 1
Synchronize devices after each inference.
We count the average inference performance of 100 images of the dataset.
Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters.
Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.

Users can directly test the speed through how_to_measure_performance_of_models.md. And here is the benchmark in our environment.

MMCls

MMCls			TensorRT						PPLNN		NCNN
Model	Dataset	Input	fp32		fp16		int8		fp16		SnapDragon888-fp32		Adreno660-fp32		model config file
Model	Dataset	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
ResNet	ImageNet	1x3x224x224	2.97	336.90	1.26	791.89	1.21	829.66	1.30	768.28	33.91	29.49	25.93	38.57	$MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py
ResNeXt	ImageNet	1x3x224x224	4.31	231.93	1.42	703.42	1.37	727.42	1.36	737.67	133.44	7.49	69.38	14.41	$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet	ImageNet	1x3x224x224	3.41	293.64	1.66	600.73	1.51	662.90	1.91	524.07	107.84	9.27	80.85	12.37	$MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py
ShuffleNetV2	ImageNet	1x3x224x224	1.37	727.94	1.19	841.36	1.13	883.47	4.69	213.33	9.55	104.71	10.66	93.81	$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py

MMDet

MMDet			TensorRT						PPLNN
Model	Dataset	Input	fp32		fp16		int8		fp16		model config file
Model	Dataset	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
YOLOv3	COCO	1x3x320x320	14.76	67.76	24.92	40.13	24.92	40.13	18.07	55.35	$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD-Lite	COCO	1x3x320x320	8.84	113.12	9.21	108.56	8.04	124.38	19.72	50.71	$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
RetinaNet	COCO	1x3x800x1344	97.09	10.30	25.79	38.78	16.88	59.23	38.34	26.08	$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS	COCO	1x3x800x1344	84.06	11.90	23.15	43.20	17.68	56.57	-	-	$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF	COCO	1x3x800x1344	82.96	12.05	21.02	47.58	13.50	74.08	30.41	32.89	$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
Faster-RCNN	COCO	1x3x800x1344	88.08	11.35	26.52	37.70	19.14	52.23	65.40	15.29	$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Mask-RCNN	COCO	1x3x800x1344	320.86	3.12	241.32	4.14	-	-	86.80	11.52	$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py

MMDet			NCNN
Model	Dataset	Input	SnapDragon888-fp32		Adreno660-fp32		model config file
Model	Dataset	Input	latency (ms)	FPS	latency (ms)	FPS	model config file
MobileNetv2-YOLOv3	COCO	1x3x320x320	48.57	20.59	66.55	15.03	$MMDET_DIR/configs/yolo/yolov3_mobilenetv2_mstrain-416_300e_coco.py
SSD-Lite	COCO	1x3x320x320	44.91	22.27	66.19	15.11	$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py

MMEdit

MMEdit		TensorRT						PPLNN
Model	Input	fp32		fp16		int8		fp16		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
ESRGAN	1x3x32x32	12.64	79.14	12.42	80.50	12.45	80.35	7.67	130.39	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
SRCNN	1x3x32x32	0.70	1436.47	0.35	2836.62	0.26	3850.45	0.56	1775.11	$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py

MMOCR

MMOCR			TensorRT						PPLNN		NCNN
Model	Dataset	Input	fp32		fp16		int8		fp16		SnapDragon888-fp32		Adreno660-fp32		model config file
Model	Dataset	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
DBNet	ICDAR2015	1x3x640x640	10.70	93.43	5.62	177.78	5.00	199.85	34.84	28.70	-	-	-	-	$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
CRNN	IIIT5K	1x1x32x32	1.93	518.28	1.40	713.88	1.36	736.79	-	-	10.57	94.64	20.00	50.00	$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py

MMSeg

MMSeg			TensorRT						PPLNN
Model	Dataset	Input	fp32		fp16		int8		fp16		model config file
Model	Dataset	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
FCN	Cityscapes	1x3x512x1024	128.42	7.79	23.97	41.72	18.13	55.15	27.00	37.04	$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet	Cityscapes	1x3x512x1024	119.77	8.35	24.10	41.49	16.33	61.23	27.26	36.69	$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3	Cityscapes	1x3x512x1024	226.75	4.41	31.80	31.45	19.85	50.38	36.01	27.77	$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3+	Cityscapes	1x3x512x1024	151.25	6.61	47.03	21.26	50.38	26.67	34.80	28.74	$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py

Performance benchmark

Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.

MMCls

MMCls			PyTorch	ONNX Runtime	TensorRT			PPLNN
Model	Task	Metrics	fp32	fp32	fp32	fp16	int8	fp16	model config file
ResNet-18	Classification	top-1	69.90	69.88	69.88	69.86	69.86	69.86	$MMCLS_DIR/configs/resnet/resnet18_b32x8_imagenet.py
ResNet-18	Classification	top-5	89.43	89.34	89.34	89.33	89.38	89.34	$MMCLS_DIR/configs/resnet/resnet18_b32x8_imagenet.py
ResNeXt-50	Classification	top-1	77.90	77.90	77.90	-	77.78	77.89	$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
ResNeXt-50	Classification	top-5	93.66	93.66	93.66	-	93.64	93.65
SE-ResNet-50	Classification	top-1	77.74	77.74	77.74	77.75	77.63	77.73	$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet-50	Classification	top-5	93.84	93.84	93.84	93.83	93.72	93.84
ShuffleNetV1 1.0x	Classification	top-1	68.13	68.13	68.13	68.13	67.71	68.11	$MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py
ShuffleNetV1 1.0x	Classification	top-5	87.81	87.81	87.81	87.81	87.58	87.80
ShuffleNetV2 1.0x	Classification	top-1	69.55	69.55	69.55	69.54	69.10	69.54	$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py
ShuffleNetV2 1.0x	Classification	top-5	88.92	88.92	88.92	88.91	88.58	88.92
MobileNet V2	Classification	top-1	71.86	71.86	71.86	71.87	70.91	71.84	$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py
MobileNet V2	Classification	top-5	90.42	90.42	90.42	90.40	89.85	90.41

MMDet

MMDet				Pytorch	ONNXRuntime	TensorRT			PPLNN	OpenVINO
Model	Task	Dataset	Metrics	fp32	fp32	fp32	fp16	int8	fp16	fp32	model config file
YOLOV3	Object Detection	COCO2017	box AP	33.7	-	33.5	33.5	33.5	-	-	$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD	Object Detection	COCO2017	box AP	25.5	-	25.5	25.5	-	-	-	$MMDET_DIR/configs/ssd/ssd300_coco.py
RetinaNet	Object Detection	COCO2017	box AP	36.5	-	36.4	36.4	36.3	36.5	-	$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS	Object Detection	COCO2017	box AP	36.6	-	36.6	36.5	-	-	-	$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF	Object Detection	COCO2017	box AP	37.4	-	37.4	37.4	37.2	37.4	-	$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
YOLOX	Object Detection	COCO2017	box AP	40.5	-	40.3	40.3	29.3	-	-	$MMDET_DIR/configs/yolox/yolox_s_8x8_300e_coco.py
Faster R-CNN	Object Detection	COCO2017	box AP	37.4	-	37.3	37.3	37.1	37.3	-	$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
ATSS	Object Detection	COCO2017	box AP	39.4	-	39.4	39.4	-	-	-	$MMDET_DIR/configs/atss/atss_r50_fpn_1x_coco.py
Cascade R-CNN	Object Detection	COCO2017	box AP	40.4	-	40.4	40.4	-	40.4	-	$MMDET_DIR/configs/cascade_rcnn/cascade_rcnn_r50_caffe_fpn_1x_coco.py
Mask R-CNN	Instance Segmentation	COCO2017	box AP	38.2	-	38.1	38.1	-	38.0	-	$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py
Mask R-CNN	Instance Segmentation	COCO2017	mask AP	34.7	-	33.7	33.7	-	-	-	$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py

MMEdit

MMEdit				Pytorch	ONNX Runtime	TensorRT			PPLNN
Model	Task	Dataset	Metrics	fp32	fp32	fp32	fp16	int8	fp16	model config file
SRCNN	Super Resolution	Set5	PSNR	28.4316	28.4323	28.4323	28.4286	28.1995	28.4311	$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py
SRCNN	Super Resolution	Set5	SSIM	0.8099	0.8097	0.8097	0.8096	0.7934	0.8096
ESRGAN	Super Resolution	Set5	PSNR	28.2700	28.2592	28.2592	-	-	28.2624	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_x4c64b23g32_g1_400k_div2k.py
ESRGAN	Super Resolution	Set5	SSIM	0.7778	0.7764	0.7774	-	-	0.7765
ESRGAN-PSNR	Super Resolution	Set5	PSNR	30.6428	30.6444	30.6430	-	-	27.0426	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
ESRGAN-PSNR	Super Resolution	Set5		0.8559	0.8558	0.8558	-	-	0.8557
SRGAN	Super Resolution	Set5	PSNR	27.9499	27.9408	27.9408	-	-	27.9388	$MMEDIT_DIR/configs/restorers/srresnet_srgan/srgan_x4c64b16_g1_1000k_div2k.pyy
SRGAN	Super Resolution	Set5	SSIM	0.7846	0.7839	0.7839	-	-	0.7839
SRResNet	Super Resolution	Set5	PSNR	30.2252	30.2300	30.2300	-	-	30.2294	$MMEDIT_DIR/configs/restorers/srresnet_srgan/msrresnet_x4c64b16_g1_1000k_div2k.py
SRResNet	Super Resolution	Set5		0.8491	0.8488	0.8488	-	-	0.8488
Real-ESRNet	Super Resolution	Set5	PSNR	28.0297	27.7016	27.7016	-	-	27.7049	$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py
Real-ESRNet	Super Resolution	Set5	SSIM	0.8236	0.8122	0.8122	-	-	0.8123
EDSR	Super Resolution	Set5	PSNR	30.2223	30.2214	30.2214	30.2211	30.1383	-	$MMEDIT_DIR/configs/restorers/edsr/edsr_x4c64b16_g1_300k_div2k.py
EDSR	Super Resolution	Set5	SSIM	0.8500	0.8497	0.8497	0.8497	0.8469	-

MMOCR

MMOCR				Pytorch	ONNXRuntime	TensorRT			PPLNN	OpenVINO
Model	Task	Dataset	Metrics	fp32	fp32	fp32	fp16	int8	fp16	fp32	model config file
DBNet*	TextDetection	ICDAR2015	recall	0.7310	0.7304	0.7198	0.7179	0.7111	0.7304	0.7309	$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
			precision	0.8714	0.8718	0.8677	0.8674	0.8688	0.8718	0.8714
			hmean	0.7950	0.7949	0.7868	0.7856	0.7821	0.7949	0.7950
CRNN	TextRecognition	IIIT5K	acc	0.8067	0.8067	0.8067	0.8063	0.8067	0.8067	-	$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
SAR	TextRecognition	IIIT5K	acc	0.9517	0.9287	-	-	-	-	-	$MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py

MMSeg

MMSeg			Pytorch	ONNXRuntime	TensorRT			PPLNN
Model	Dataset	Metrics	fp32	fp32	fp32	fp16	int8	fp16	model config file
FCN	Cityscapes	mIoU	72.25	-	72.36	72.35	74.19	-	$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet	Cityscapes	mIoU	78.55	-	78.26	78.24	77.97	-	$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
deeplabv3	Cityscapes	mIoU	79.09	-	79.12	79.12	78.96	-	$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py
deeplabv3+	Cityscapes	mIoU	79.61	-	79.6	79.6	79.43	-	$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py
Fast-SCNN	Cityscapes	mIoU	70.96	-	70.93	70.92	66.0	-	$MMSEG_DIR/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py

Notes

As some datasets contain images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.
Some int8 performance benchmarks of TensorRT require Nvidia cards with tensor core, or the performance would drop heavily.
DBNet uses the interpolate mode nearest in the neck of the model, which TensorRT-7 applies a quite different strategy from Pytorch. To make the repository compatible with TensorRT-7, we rewrite the neck to use the interpolate mode bilinear which improves final detection performance. To get the matched performance with Pytorch, TensorRT-8+ is recommended, which the interpolate methods are all the same as Pytorch.
Mask AP of Mask R-CNN drops by 1% for the backend. The main reason is that the predicted masks are directly interpolated to original image in PyTorch, while they are at first interpolated to the preprocessed input image of the model and then to original image in other backends.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark.md

benchmark.md

Benchmark

Backends

Latency benchmark

Platform

Other settings

Performance benchmark

Notes

Files

benchmark.md

Latest commit

History

benchmark.md

File metadata and controls

Benchmark

Backends

Latency benchmark

Platform

Other settings

Performance benchmark

Notes