Skip to content
This repository was archived by the owner on May 17, 2023. It is now read-only.

[performance] Investigation of possibility to add map_buffer synchronization in other encoders #287

Open
andreyor opened this issue Jun 9, 2018 · 1 comment
Assignees

Comments

@andreyor
Copy link
Contributor

andreyor commented Jun 9, 2018

17% 1N performance regression vs. MSS 2018 R1 was root caused and fixed in this commit:
90fd745

For AVC FEI - commit under testing:
#280


To reproduce issue with 1N AVCe regression:

  1. create par-file par_1n_avce.txt with:

-o::sink -n 600 -i::CODEC IN_STREAM -async 1 -hw
-o::h264 out_1.h264 -i::source -b 6000 -u 1 -async 1 -hw
-o::h264 out_2.h264 -i::source -b 6000 -u 1 -async 1 -hw

,where
IN_STREAM: interlaced, resolutions: FHD, HD, SD; CODEC: mpeg2, h264

  1. run ./sample_multi_transcode -par par_1n_avce.txt
  2. you can see performance regression between current library and MSS 2018 R1
@dvrogozh
Copy link
Contributor

dvrogozh commented Jun 9, 2018

@XinfengZhang, @xhaihao : guys, please pay attention on this issue. It highlights the following problem in VAAPI. Originally VAAPI was decoders oriented, encoders were added later and unfortunately the usage scenarios were not considered fully. One of the key usage scenario is ABR transcoding which means that there is pipeline like:

<decoder of 1920x1080> -> <encoding of 1920x1080 w/ bitrate1>
                      -> <encoding of 1920x1080 w/ bitrate2>
                      -> <vp: downscale to 1280x720> -> encoding
                      -> <vp: downscale to 720x480> -> encoding

The problem with current VAAPI is that it forces encoders to be synchronized by input surface thru vaSyncSurface. As a result if we have multiple operations scheduled for the input surface all encoders will wait for the completion of last submitted operation. We eventually step into few issues with that:

  1. Once first operation completes it actually waits while it can make next operation to start, i.e. we slow down overall processing.
  2. Since all threads waiting on the same vaSyncSurface will be awaken at the same time we actually step into thundering herd since they all will suddenly request CPU time.

Please, change VAAPI to eliminate this issue. There are few ways to fix it:

  1. The one which https://github.com/intel/media-driver follows is to completely avoid vaSyncSurface for encoders and rely on vaMapBuffer instead. I believe we additionally need to do the following:
    1.1. Synchronization via input surface still should be possible for backward compatibility
    1.2. Synchronization by vaMapBuffer should be explicitly allowed and noted in the VAAPI documentation
    1.3. VAAPI should have capability which can be queried: whether encoder supports synchronization by vaMapBuffer or not
  2. Another variant is to have separate call for encoders synchronization. I.e. encoder waits for bitstream to be ready, i.e. vaSyncBuffer(va_bitstream) call should be provided.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants