Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition: setIntfIp Fails Due to PortChannel Initialization Delay #21998

Open
tshalvi opened this issue Mar 11, 2025 · 2 comments
Open

Comments

@tshalvi
Copy link
Contributor

tshalvi commented Mar 11, 2025

Description:
An error occurs when setIntfIp attempts to assign an IP address to a PortChannel before its initialization is complete. This indicates a race condition between swss (which assigns the IP) and teamd (which creates the PortChannel).

Observed Behavior:
In the logs, we see that setIntfIp fails when trying to assign an IP to PortChannel108:

2025 Feb 25 04:45:11.722046 arc-switch1004 ERR swss#intfmgrd: :- setIntfIp: Command '/sbin/ip address "add" "10.0.0.70/31" dev "PortChannel108"' failed with rc 2

Subsequent log entries confirm that PortChannel108 had not been fully initialized when setIntfIp was executed:

2025 Feb 25 04:45:11.889335 arc-switch1004 INFO teamd#supervisord: teammgrd Using team device "PortChannel108".
2025 Feb 25 04:45:12.031833 arc-switch1004 WARNING teamd#tlm_teamd: :- try_add_lag: Can't connect to teamd LAG='PortChannel108', error='No such file or directory'. attempt=1

These logs indicate that PortChannel108 was still in the process of being created when setIntfIp was executed.

Later, we see that PortChannel108 becomes fully initialized only after setIntfIp had already failed:

2025 Feb 25 04:45:12.166107 arc-switch1004 NOTICE teamd#teammgrd: :- addLag: Start port channel PortChannel108 with teamd
2025 Feb 25 04:45:12.166314 arc-switch1004 NOTICE teamd#teammgrd: :- setLagAdminStatus: Set port channel PortChannel108 admin status to up

Root Cause:
There is a timing issue where setIntfIp executes before teamd has fully initialized the PortChannel.

Reproduction Frequency:
This issue reproduces very rarely. It was observed only once in our setup (SN2700) and occurred on a system with a weak CPU, suggesting that timing variations due to system performance may contribute to the issue.

Expected Behavior:
setIntfIp should only attempt to assign an IP address after the PortChannel has been fully initialized by teamd.

@bingwang-ms
Copy link
Contributor

@saiarcot895 Can you help take a look at this issue?

@saiarcot895
Copy link
Contributor

Same as #10336

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants