You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was discovered after warm reboot started failing due to docker execs timing out (There is a check_docker_exec function as part of fast-reboot and warm-reboot https://github.com/sonic-net/sonic-utilities/blob/2f8508f182566dd75a7c66cfd8dcccaa21470485/scripts/fast-reboot#L469). Issue previously opened : #21236
We see that the docker execs are taking more time after the debian upgrade from 202311 (Bullseye) to 202411/202405(Bookworm) And it is affecting warm reboot since we have a check for 1 second timeout for docker execs.
Analysis:
There seems to be an increase in docker exec times for almost all dockers between 202311 and 202411, This is tested without any additional stress/load on CPU. With a simple bash script for 1000 runs of docker execs and measuring maximum, minimum and average time for a simple docker exec (docker exec swss echo "success"). The same dockers which are used for the reboot check in warm-reboot are used
This is result from 202311:
Running tests for container: bgp
Results for container: bgp
Average: 0.136636 seconds
Min: 0.11147 seconds
Max: 0.324412 seconds
Running tests for container: lldp
Results for container: lldp
Average: 0.133063 seconds
Min: 0.109969 seconds
Max: 0.337552 seconds
Running tests for container: swss
Results for container: swss
Average: 0.100698 seconds
Min: 0.076642 seconds
Max: 0.269502 seconds
Running tests for container: syncd
Results for container: syncd
Average: 0.0983149 seconds
Min: 0.0771413 seconds
Max: 0.251317 seconds
Running tests for container: database
Results for container: database
Average: 0.099173 seconds
Min: 0.0770285 seconds
Max: 0.235249 seconds
And this is from 202411:
Running tests for container: bgp
Results for container: bgp
Average: 0.161145 seconds
Min: 0.119685 seconds
Max: 0.821305 seconds
Running tests for container: lldp
Results for container: lldp
Average: 0.168193 seconds
Min: 0.119636 seconds
Max: 0.539189 seconds
Running tests for container: swss
Results for container: swss
Average: 0.116055 seconds
Min: 0.0840077 seconds
Max: 0.254673 seconds
Running tests for container: syncd
Results for container: syncd
Average: 0.117382 seconds
Min: 0.0828154 seconds
Max: 0.404531 seconds
Running tests for container: database
Results for container: database
Average: 0.169357 seconds
Min: 0.120504 seconds
Max: 0.606531 seconds
The maximum time which is seen is higher across all the dockers. This seems to affect warm boot (because of the timeout check for docker execs) but issue could be something bigger such as performance of dockers after the debian upgrade
The text was updated successfully, but these errors were encountered:
Please use the following command for strace; that way, at least partial timing information is collected for all syscalls: strace -ttf --strings-in-hex=non-ascii-chars -s 128 ....
The strace output is needed only for one container; I'm fairly certain this is a container-independent issue.
Additionally, can you try the same experiment (1000 runs of docker exec database echo success) on the database container, but with all other containers stopped? If the max (and maybe average) time is lower, then this will prove that the time taken for docker exec to complete is subject to the system load. If they're the same, then the factor here is the time needed for system calls and/or the work that docker (and associated binaries) are doing.
Issue Summary
This issue was discovered after warm reboot started failing due to docker execs timing out (There is a check_docker_exec function as part of fast-reboot and warm-reboot https://github.com/sonic-net/sonic-utilities/blob/2f8508f182566dd75a7c66cfd8dcccaa21470485/scripts/fast-reboot#L469). Issue previously opened : #21236
We see that the docker execs are taking more time after the debian upgrade from 202311 (Bullseye) to 202411/202405(Bookworm) And it is affecting warm reboot since we have a check for 1 second timeout for docker execs.
Analysis:
There seems to be an increase in docker exec times for almost all dockers between 202311 and 202411, This is tested without any additional stress/load on CPU. With a simple bash script for 1000 runs of docker execs and measuring maximum, minimum and average time for a simple docker exec (
docker exec swss echo "success"
). The same dockers which are used for the reboot check in warm-reboot are usedThis is result from 202311:
And this is from 202411:
The maximum time which is seen is higher across all the dockers. This seems to affect warm boot (because of the timeout check for docker execs) but issue could be something bigger such as performance of dockers after the debian upgrade
The text was updated successfully, but these errors were encountered: