Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SONIC-SWSS][PORT] inconsistent behavior between combine and separate port configuration deployment #21959

Open
yuazhe opened this issue Mar 7, 2025 · 6 comments
Assignees
Labels
Bug 🐛 Issue for 202411 Triaged this issue has been triaged

Comments

@yuazhe
Copy link
Contributor

yuazhe commented Mar 7, 2025

Below flow will generate an invalid auto negotiation configuration scenario, in this case sometime the port will never be up again

sudo config interface type Ethernet4 none
sudo config interface speed Ethernet4 1000
sudo config interface type Ethernet4 CR
sudo config interface advertised-speeds Ethernet4 all
sudo config interface advertised-types Ethernet4 CR,CR2,CR4
sudo config interface shutdown Ethernet4
sudo config interface startup Ethernet4

sudo config interface type Ethernet8 none
sudo config interface speed Ethernet8 1000
sudo config interface type Ethernet8 CR
sudo config interface advertised-speeds Ethernet8 1000
sudo config interface advertised-types Ethernet8 CR4
sudo config interface shutdown Ethernet8
sudo config interface startup Ethernet8

sudo config interface autoneg Ethernet8 enabled

It could been seen from swss.rec that in enable autoneg command, there could be 2 possible configuration deployment ways.
This is because https://github.com/sonic-net/sonic-swss/blob/4eb74f0082f0f8c4537fe58621ac902c870d217c/cfgmgr/portmgr.cpp#L206
could be ran either before or together with https://github.com/sonic-net/sonic-swss/blob/4eb74f0082f0f8c4537fe58621ac902c870d217c/cfgmgr/portmgr.cpp#L230
which can't be controlled.

2025-03-07.08:42:55.688322|PORT_TABLE:Ethernet8|SET|alias:etp3|index:3|lanes:16,17,18,19|speed:1000|interface_type:CR|adv_speeds:1000|adv_interface_types:CR4|autoneg:on
2025-03-07.08:42:55.702949|PORT_TABLE:Ethernet8|SET|mtu:9100|admin_status:up

and

2025-03-01.06:58:48.579256|PORT_TABLE:Ethernet8|SET|alias:etp3|index:3|lanes:16,17,18,19|speed:1000|autoneg:on|interface_type:CR|adv_speeds:1000|adv_interface_types:CR4|mtu:9100|admin_status:up

The first one will always keep the port up, but the second one will always keep the port down because during autoneg it will fail and directly continue without any fallback mechanism
https://github.com/sonic-net/sonic-swss/blob/4eb74f0082f0f8c4537fe58621ac902c870d217c/orchagent/portsorch.cpp#L4085-L4100

@dgsudharsan
Copy link
Collaborator

@liuh-80 Can you please help investigate this issue? @qiluo-msft for visibility.

@arlakshm arlakshm added the Triaged this issue has been triaged label Mar 12, 2025
@liuh-80
Copy link
Contributor

liuh-80 commented Mar 13, 2025

@yuazhe , can you share me following information to reproduce this issue?

  1. Image version
  2. Hardware SKU
  3. Reproduce steps
  4. how to verify issue happened

I try following steps multiple times with latest 202411 image on KVM testbed, but can't reproduce this issue:

admin@vlab-01:~$ show version

SONiC Software Version: SONiC.202411.796364-c371cd3d5
SONiC OS Version: 12
Distribution: Debian 12.9
Kernel: 6.1.0-22-2-amd64
Build commit: c371cd3d5
Build date: Wed Mar 12 14:43:22 UTC 2025
Built by: azureuser@ec7363b4c000000


admin@vlab-01:~$ sudo config interface type Ethernet4 none
admin@vlab-01:~$ sudo config interface speed Ethernet4 1000
admin@vlab-01:~$ sudo config interface type Ethernet4 CR
admin@vlab-01:~$ sudo config interface advertised-speeds Ethernet4 all
admin@vlab-01:~$ sudo config interface advertised-types Ethernet4 CR,CR2,CR4
admin@vlab-01:~$ sudo config interface shutdown Ethernet4
admin@vlab-01:~$ sudo config interface startup Ethernet4
admin@vlab-01:~$
admin@vlab-01:~$ sudo config interface type Ethernet8 none
admin@vlab-01:~$ sudo config interface speed Ethernet8 1000
admin@vlab-01:~$ sudo config interface type Ethernet8 CR
admin@vlab-01:~$ sudo config interface advertised-speeds Ethernet8 1000
admin@vlab-01:~$ sudo config interface advertised-types Ethernet8 CR4
admin@vlab-01:~$ sudo config interface shutdown Ethernet8
admin@vlab-01:~$ sudo config interface startup Ethernet8
admin@vlab-01:~$
admin@vlab-01:~$ sudo config interface autoneg Ethernet8 enabled
admin@vlab-01:~$
admin@vlab-01:~$ show interfaces status
     Interface            Lanes    Speed    MTU    FEC           Alias            Vlan    Oper    Admin    Type    Asym PFC
--------------  ---------------  -------  -----  -----  --------------  --------------  ------  -------  ------  ----------
     Ethernet0      25,26,27,28      40G   9100    N/A    fortyGigE0/0          routed    down     down     N/A         off
     Ethernet4      29,30,31,32       1G   9100    N/A    fortyGigE0/4           trunk      up       up     N/A         off
     Ethernet8      33,34,35,36       1G   9100    N/A    fortyGigE0/8           trunk      up       up     N/A         off

Also, the issue seems not related with this change, because after I revert it also can't reproduce this issue: sonic-net/sonic-swss#3304

@yuazhe
Copy link
Contributor Author

yuazhe commented Mar 13, 2025

@qiluo-msft your reproduce steps is correct, when the issue happens, it will looks like

  Interface            Lanes    Speed    MTU    FEC    Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  ---------------  -------  -----  -----  -------  ------  ------  -------  ---------------  ----------
  Ethernet4        8,9,10,11       1G   9100    N/A     etp2  routed    down       up  QSFP28 or later         N/A
  Ethernet8      16,17,18,19       1G   9100    N/A     etp3  routed    down       up  QSFP28 or later         N/A

I was using 2411 image wish SKU ACS-MSN4600C, but I think this problem is general and not platform specific

@liuh-80
Copy link
Contributor

liuh-80 commented Mar 13, 2025

As my understand, the root cause of this issue is not the CONFIG_DB write operation "merged", it's not retry when autoneg failed.

@liuh-80
Copy link
Contributor

liuh-80 commented Mar 13, 2025

Issue can be easily reproduced on Mellanox 4600 hardware:

2025-03-13.06:25:49.281583|PORT_TABLE:Ethernet4|SET|alias:etp2|description:ARISTA01T2:Ethernet2|fec:rs|index:2|lanes:8,9,10,11|pfc_asym:off|speed:1000|subport:0|tpid:0x8100|interface_type:CR|adv_speeds:all|adv_interface_types:CR,CR2,CR4|mtu:9100|admin_status:up
2025-03-13.06:25:49.281601|PORT_TABLE:Ethernet8|SET|alias:etp3|description:etp3|fec:rs|index:3|lanes:16,17,18,19|pfc_asym:off|speed:1000|subport:0|tpid:0x8100|interface_type:CR|mtu:9100|adv_speeds:1000|adv_interface_types:CR4|admin_status:up|autoneg:on

$ show interface status
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC


  Ethernet0          0,1,2,3     100G   9100     rs     etp1   PortChannel102      up       up  QSFP28 or later         off
  Ethernet4        8,9,10,11       1G   9100     rs     etp2   PortChannel102    down       up  QSFP28 or later         off
  Ethernet8      16,17,18,19       1G   9100     rs     etp3           routed    down       up  QSFP28 or later         off

@liuh-80
Copy link
Contributor

liuh-80 commented Mar 13, 2025

After revert this PR sonic-net/sonic-swss#3304, the issue still happen:

2025-03-13.06:47:40.946313|PORT_TABLE:Ethernet8|SET|alias:etp3|description:etp3|fec:rs|index:3|lanes:16,17,18,19|pfc_asym:off|speed:1000|subport:0|tpid:0x8100|interface_type:CR|adv_speeds:1000|adv_interface_types:CR4|mtu:9100|admin_status:up
2025-03-13.06:47:41.440647|PORT_TABLE:Ethernet8|SET|alias:etp3|description:etp3|fec:rs|index:3|lanes:16,17,18,19|pfc_asym:off|speed:1000|subport:0|tpid:0x8100|interface_type:CR|adv_speeds:1000|adv_interface_types:CR4|autoneg:on|mtu:9100|admin_status:up

admin@bjw2-can-4600c-3:~$ show interface status
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC


  Ethernet0          0,1,2,3     100G   9100     rs     etp1   PortChannel102      up       up  QSFP28 or later         off
  Ethernet4        8,9,10,11       1G   9100     rs     etp2   PortChannel102    down       up  QSFP28 or later         off
  Ethernet8      16,17,18,19       1G   9100     rs     etp3           routed    down       up  QSFP28 or later         off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 Issue for 202411 Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

7 participants