Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syslog forwarding rules validation and thread exception handling #17572

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ytzur1
Copy link

@ytzur1 ytzur1 commented Mar 18, 2025

Add validation of rsyslog message from conf file like telemetry, bgp, gnmi, etc.
Implement thread exception handler for sub thread error capture

Description of PR

  1. Adding validation of rsyslog messages from configuration files (telemetry, bgp, gnmi, etc.)
  2. Implementing a thread exception handler for capturing subthread errors

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

The current syslog testing framework lacks validation of rsyslog messages from various configuration files. Additionally, exception during subthreads were not being caught by the main thread.

How did you do it?

  1. Added rsyslog message validation:

    • Implemented verification of logger messages from telemetry, bgp, gnmi,
  2. Implemented thread-safe exception handling:

    • Created a global thread_exceptions list for capturing errors
    • Added ThreadWrapper class for safe exception capture
    • Implemented handle_thread_exceptions() for error reporting in main thread

How did you verify/test it?

Ran all test cases in test_syslog_source_ip.py

Any platform specific information?

No

Supported testbed topology if it's a new test case?

No

Documentation

Sorry, something went wrong.

Add validation of rsyslog message from conf file like telemetry, bgp, gnmi, etc.
Implement thread-safe exception handler for subthread error capture
Enhance logging message generation for comprehensive testing

Tested: Ran all test cases in test_syslog_source_ip.py
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft requested review from liuh-80 and zbud-msft March 18, 2025 19:25
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ytzur1 ytzur1 force-pushed the syslog_test_case branch from 537deea to ec5e1e8 Compare March 19, 2025 08:37
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms
Copy link
Collaborator

Related image change sonic-net/sonic-buildimage#21923

time.sleep(0.2)
for flag, msg in logging_data:
for i in range(syslogUtilsConst.PACKETS_NUM):
dut.shell(f"logger {default_priority} {flag} {msg} {i + 1}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command run in host.

However, the config causes this issue are:
'''
if re_match($programname, "bgp[0-9]*#(frr|zebra|staticd|watchfrr)") then {
/var/log/frr/zebra.log
stop
}
'''

Which means when reproduce the issue, the log send inside bgp docker container, then the 'programname' part can match.

I warry about this test case can't prevent regression. can you test with an image have this issue to make sure this test can catch issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good suggestion: test with an image have this issue to make sure this test can catch issue

Copy link
Author

@ytzur1 ytzur1 Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with an image that had the configuration issue, and the test failed and reporting that not all types of forwarded messages were successful.

@zbud-msft
Copy link
Contributor

Hi @ytzur1, I manually ran your test changes from our test server and seeing following issues when running the testcase. Can you please double check your logic in capture_syslog_packets() function in syslog/syslog_utils.py

25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_Vrf-data.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_Vrf-data.pcap"}}, "_ansible_no_log": false, "changed": false}
25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_default.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_default.pcap"}}, "_ansible_no_log": false, "changed": false}
25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_mgmt.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_mgmt.pcap"}}, "_ansible_no_log": false, "changed": false}

Copy link
Contributor

@zbud-msft zbud-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check logic for capture_syslog_packets, seems to run into "file not found: /tmp/X.pcap" issue

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ytzur1 ytzur1 force-pushed the syslog_test_case branch from 501acf4 to e1ed921 Compare March 26, 2025 10:58
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ytzur1
Copy link
Author

ytzur1 commented Mar 26, 2025

Hi, @zbud-msft
The error was caused by a missing tab in tcpdump cmd command after pre-commit formatting of long lines.
It's fixed and latest version was tested and it's working.

Hi @ytzur1, I manually ran your test changes from our test server and seeing following issues when running the testcase. Can you please double check your logic in capture_syslog_packets() function in syslog/syslog_utils.py

25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_Vrf-data.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_Vrf-data.pcap"}}, "_ansible_no_log": false, "changed": false} 25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_default.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_default.pcap"}}, "_ansible_no_log": false, "changed": false} 25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_mgmt.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_mgmt.pcap"}}, "_ansible_no_log": false, "changed": false}

@zbud-msft
Copy link
Contributor

@ytzur1 Would it be possible to post screenshot of testcases passing since kvm tests will skip?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants