[performance] Investigation of possibility to add map_buffer synchronization in other encoders #287

andreyor · 2018-06-09T14:11:30Z

17% 1N performance regression vs. MSS 2018 R1 was root caused and fixed in this commit:
90fd745

For AVC FEI - commit under testing:
#280

To reproduce issue with 1N AVCe regression:

create par-file par_1n_avce.txt with:

-o::sink -n 600 -i::CODEC IN_STREAM -async 1 -hw
-o::h264 out_1.h264 -i::source -b 6000 -u 1 -async 1 -hw
-o::h264 out_2.h264 -i::source -b 6000 -u 1 -async 1 -hw

,where
IN_STREAM: interlaced, resolutions: FHD, HD, SD; CODEC: mpeg2, h264

run ./sample_multi_transcode -par par_1n_avce.txt
you can see performance regression between current library and MSS 2018 R1

The text was updated successfully, but these errors were encountered:

dvrogozh · 2018-06-09T14:36:32Z

@XinfengZhang, @xhaihao : guys, please pay attention on this issue. It highlights the following problem in VAAPI. Originally VAAPI was decoders oriented, encoders were added later and unfortunately the usage scenarios were not considered fully. One of the key usage scenario is ABR transcoding which means that there is pipeline like:

<decoder of 1920x1080> -> <encoding of 1920x1080 w/ bitrate1>
                      -> <encoding of 1920x1080 w/ bitrate2>
                      -> <vp: downscale to 1280x720> -> encoding
                      -> <vp: downscale to 720x480> -> encoding

The problem with current VAAPI is that it forces encoders to be synchronized by input surface thru vaSyncSurface. As a result if we have multiple operations scheduled for the input surface all encoders will wait for the completion of last submitted operation. We eventually step into few issues with that:

Once first operation completes it actually waits while it can make next operation to start, i.e. we slow down overall processing.
Since all threads waiting on the same vaSyncSurface will be awaken at the same time we actually step into thundering herd since they all will suddenly request CPU time.

Please, change VAAPI to eliminate this issue. There are few ways to fix it:

The one which https://github.com/intel/media-driver follows is to completely avoid vaSyncSurface for encoders and rely on vaMapBuffer instead. I believe we additionally need to do the following:
1.1. Synchronization via input surface still should be possible for backward compatibility
1.2. Synchronization by vaMapBuffer should be explicitly allowed and noted in the VAAPI documentation
1.3. VAAPI should have capability which can be queried: whether encoder supports synchronization by vaMapBuffer or not
Another variant is to have separate call for encoders synchronization. I.e. encoder waits for bitstream to be ready, i.e. vaSyncBuffer(va_bitstream) call should be provided.

andreyor added the enhancement label Jun 9, 2018

dvrogozh mentioned this issue Jun 9, 2018

ABR transcoding workloads underperform from ill-defined synchronization by input surface intel/libva#219

Open

daleksan added the AVCe label Jun 9, 2021

kovakimy self-assigned this Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[performance] Investigation of possibility to add map_buffer synchronization in other encoders #287

[performance] Investigation of possibility to add map_buffer synchronization in other encoders #287

andreyor commented Jun 9, 2018 •

edited

Loading

dvrogozh commented Jun 9, 2018 •

edited

Loading

[performance] Investigation of possibility to add map_buffer synchronization in other encoders #287

[performance] Investigation of possibility to add map_buffer synchronization in other encoders #287

Comments

andreyor commented Jun 9, 2018 • edited Loading

dvrogozh commented Jun 9, 2018 • edited Loading

andreyor commented Jun 9, 2018 •

edited

Loading

dvrogozh commented Jun 9, 2018 •

edited

Loading