Skip to content

Commit 8ee3009

Browse files
authored
[202305] Fix intermittent build failure for test_SfpStateUpdateTask_task_run_stop (sonic-net#461) (sonic-net#463)
Cherry-pick for sonic-net#461 MSFT ADO - 24271303 Description An intermittent build failure is seen for the test_SfpStateUpdateTask_task_run_stop unit-test with the below backtrace. We have observed more than one type of backtrace so far (details are mentioned in the issues mentioned below). [2023-06-21T12:27:31.213Z] =================================== FAILURES =================================== [2023-06-21T12:27:31.213Z] ____________ TestXcvrdScript.test_SfpStateUpdateTask_task_run_stop _____________ [2023-06-21T12:27:31.213Z] [2023-06-21T12:27:31.213Z] self = <tests.test_xcvrd.TestXcvrdScript object at 0x7ff1f404b610> [2023-06-21T12:27:31.213Z] [2023-06-21T12:27:31.213Z] @patch('xcvrd.xcvrd_utilities.port_mapping.subscribe_port_config_change', MagicMock(return_value=(None, None))) [2023-06-21T12:27:31.213Z] def test_SfpStateUpdateTask_task_run_stop(self): [2023-06-21T12:27:31.213Z] port_mapping = PortMapping() [2023-06-21T12:27:31.213Z] stop_event = threading.Event() [2023-06-21T12:27:31.213Z] sfp_error_event = threading.Event() [2023-06-21T12:27:31.213Z] task = SfpStateUpdateTask(DEFAULT_NAMESPACE, port_mapping, stop_event, sfp_error_event) [2023-06-21T12:27:31.213Z] task.start() [2023-06-21T12:27:31.213Z] assert wait_until(5, 1, task.is_alive) [2023-06-21T12:27:31.213Z] task.raise_exception() [2023-06-21T12:27:31.213Z] > task.join() [2023-06-21T12:27:31.213Z] [2023-06-21T12:27:31.213Z] tests/test_xcvrd.py:1041: [2023-06-21T12:27:31.214Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:2200: in join [2023-06-21T12:27:31.214Z] raise self.exc [2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:2175: in run [2023-06-21T12:27:31.214Z] self.task_worker(self.task_stopping_event, self.sfp_error_event) [2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1987: in task_worker [2023-06-21T12:27:31.214Z] self.init() [2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1905: in init [2023-06-21T12:27:31.214Z] self.retry_eeprom_set = self._post_port_sfp_info_and_dom_thr_to_db_once(port_mapping_data, self.xcvr_table_helper, self.main_thread_stop_event) [2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1845: in _post_port_sfp_info_and_dom_thr_to_db_once [2023-06-21T12:27:31.214Z] warmstart.initialize("xcvrd", "pmon") [2023-06-21T12:27:31.214Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2023-06-21T12:27:31.214Z] [2023-06-21T12:27:31.214Z] app_name = 'xcvrd', docker_name = 'pmon', db_timeout = 0, isTcpConn = False [2023-06-21T12:27:31.214Z] [2023-06-21T12:27:31.214Z] @staticmethod [2023-06-21T12:27:31.214Z] def initialize(app_name, docker_name, db_timeout=0, isTcpConn=False): [2023-06-21T12:27:31.214Z] > return _swsscommon.WarmStart_initialize(app_name, docker_name, db_timeout, isTcpConn) [2023-06-21T12:27:31.214Z] E RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address [2023-06-21T12:27:31.214Z] [2023-06-21T12:27:31.214Z] /usr/lib/python3/dist-packages/swsscommon/swsscommon.py:3369: RuntimeError [2023-06-21T12:27:31.214Z] =============================== warnings summary =============================== Consistent way to fail the test Add at least 1s sleep in between the below lines sonic-platform-daemons/sonic-xcvrd/tests/test_xcvrd.py Lines 1966 to 1967 in d0fd1e1 task.start() assert wait_until(5, 1, task.is_alive) Motivation and Context fixes sonic-net#369, fixes sonic-net#370, fixes sonic-net#382 Intention of the testcase The test case tests a scenario wherein while SfpStateUpdateTask thread is busy polling for few seconds, its parent process (xcvrd in an ideal case) generates an exception to stop the SfpStateUpdateTask thread in the middle of polling. RCA for current failure Since test_SfpStateUpdateTask_task_run_stop does not have all the underlying functions of SfpStateUpdateTask.task_worker mocked, there is a probability of SfpStateUpdateTask thread exiting early which will thereby cause assert wait_until(5, 1, task.is_alive) to fail based on the place at which SfpStateUpdateTask thread crashes. Fix details Modified the implementation of XcvrTableHelper function to have a busy polling to ensure that SfpStateUpdateTask thread does not quit before the test_SfpStateUpdateTask_task_run_stop testcase raises an exception. Also, this approach will ensure that if any new function calls are being added in SfpStateUpdateTask.task_worker, they will not needed to be mocked in SfpStateUpdateTask.task_worker since XcvrTableHelper is the first function called in SfpStateUpdateTask.task_worker. How Has This Been Tested? Ran the test multiple times to ensure that the testcases passes. Also, added a 3s sleep in between the below lines and ensured that the testcases was passing consistently. sonic-platform-daemons/sonic-xcvrd/tests/test_xcvrd.py Lines 1966 to 1967 in d0fd1e1 task.start() assert wait_until(5, 1, task.is_alive)
1 parent 4fad078 commit 8ee3009

File tree

1 file changed

+17
-10
lines changed

1 file changed

+17
-10
lines changed

sonic-xcvrd/tests/test_xcvrd.py

+17-10
Original file line numberDiff line numberDiff line change
@@ -1275,17 +1275,24 @@ def test_SfpStateUpdateTask_handle_port_change_event(self, mock_update_status_hw
12751275
assert not task.port_mapping.logical_to_asic
12761276
assert mock_update_status_hw.call_count == 1
12771277

1278-
@patch('xcvrd.xcvrd_utilities.port_mapping.subscribe_port_config_change', MagicMock(return_value=(None, None)))
12791278
def test_SfpStateUpdateTask_task_run_stop(self):
1280-
port_mapping = PortMapping()
1281-
stop_event = threading.Event()
1282-
sfp_error_event = threading.Event()
1283-
task = SfpStateUpdateTask(DEFAULT_NAMESPACE, port_mapping, stop_event, sfp_error_event)
1284-
task.start()
1285-
assert wait_until(5, 1, task.is_alive)
1286-
task.raise_exception()
1287-
task.join()
1288-
assert wait_until(5, 1, lambda: task.is_alive() is False)
1279+
def poll_forever(*args, **kwargs):
1280+
while True:
1281+
time.sleep(1)
1282+
# Redefine the XcvrTableHelper function to poll forever so that the task can be stopped by
1283+
# raising an exception in between. Also, XcvrTableHelper is the first function to be called after
1284+
# starting the task, so having the patch here will avoid the task crashing unexpectedly
1285+
# at a different location.
1286+
with patch('xcvrd.xcvrd.XcvrTableHelper', new=poll_forever):
1287+
port_mapping = PortMapping()
1288+
stop_event = threading.Event()
1289+
sfp_error_event = threading.Event()
1290+
task = SfpStateUpdateTask(DEFAULT_NAMESPACE, port_mapping, stop_event, sfp_error_event)
1291+
task.start()
1292+
assert wait_until(5, 1, task.is_alive)
1293+
task.raise_exception()
1294+
task.join()
1295+
assert wait_until(5, 1, lambda: task.is_alive() is False)
12891296

12901297
@patch('xcvrd.xcvrd.XcvrTableHelper', MagicMock())
12911298
@patch('xcvrd.xcvrd.post_port_sfp_info_to_db')

0 commit comments

Comments
 (0)