Skip to content

Commit 728875e

Browse files
v-nakayama7440-esoln-hamaike-esolito-sanpre-commit-ci[bot]
authored
feat: add CRC error monitoring to net_monitor (#638)
* feat: add CRC error monitoring to net_monitor Signed-off-by: noriyuki.h <n-hamaike@esol.co.jp> Signed-off-by: v-nakayama7440-esol <v-nakayama7440@esol.co.jp> * add CRC error monitoring information to README.md Signed-off-by: v-nakayama7440-esol <v-nakayama7440@esol.co.jp> * ci(pre-commit): autofix Signed-off-by: noriyuki.h <n-hamaike@esol.co.jp> Signed-off-by: v-nakayama7440-esol <v-nakayama7440@esol.co.jp> Co-authored-by: noriyuki.h <n-hamaike@esol.co.jp> Co-authored-by: ito-san <57388357+ito-san@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 0c90293 commit 728875e

File tree

8 files changed

+350
-107
lines changed

8 files changed

+350
-107
lines changed

launch/tier4_system_launch/config/system_monitor/net_monitor.param.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@
33
devices: ["*"]
44
traffic_reader_port: 7636
55
monitor_program: "greengrass"
6+
crc_error_check_duration: 1
7+
crc_error_count_threshold: 1

system/system_error_monitor/config/diagnostic_aggregator/system.param.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,12 @@
122122
contains: [": Network Traffic"]
123123
timeout: 3.0
124124

125+
netowork_crc_error:
126+
type: diagnostic_aggregator/GenericAnalyzer
127+
path: network_crc_error
128+
contains: [": Network CRC Error"]
129+
timeout: 3.0
130+
125131
storage:
126132
type: diagnostic_aggregator/AnalyzerGroup
127133
path: storage

system/system_monitor/README.md

+23-22
Original file line numberDiff line numberDiff line change
@@ -53,28 +53,29 @@ Every topic is published in 1 minute interval.
5353

5454
[Usage] ✓:Supported, -:Not supported
5555

56-
| Node | Message | Intel | arm64(tegra) | arm64(raspi) | Notes |
57-
| --------------- | ---------------------- | :---: | :----------: | :----------: | ------------------------------------------------------------- |
58-
| CPU Monitor | CPU Temperature |||| |
59-
| | CPU Usage |||| |
60-
| | CPU Load Average |||| |
61-
| | CPU Thermal Throttling || - || |
62-
| | CPU Frequency |||| Notification of frequency only, normally error not generated. |
63-
| HDD Monitor | HDD Temperature |||| |
64-
| | HDD PowerOnHours |||| |
65-
| | HDD TotalDataWritten |||| |
66-
| | HDD Usage |||| |
67-
| Memory Monitor | Memory Usage |||| |
68-
| Net Monitor | Network Usage |||| |
69-
| NTP Monitor | NTP Offset |||| |
70-
| Process Monitor | Tasks Summary |||| |
71-
| | High-load Proc[0-9] |||| |
72-
| | High-mem Proc[0-9] |||| |
73-
| GPU Monitor | GPU Temperature ||| - | |
74-
| | GPU Usage ||| - | |
75-
| | GPU Memory Usage || - | - | |
76-
| | GPU Thermal Throttling || - | - | |
77-
| | GPU Frequency | - || - | |
56+
| Node | Message | Intel | arm64(tegra) | arm64(raspi) | Notes |
57+
| --------------- | ---------------------- | :---: | :----------: | :----------: | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
58+
| CPU Monitor | CPU Temperature |||| |
59+
| | CPU Usage |||| |
60+
| | CPU Load Average |||| |
61+
| | CPU Thermal Throttling || - || |
62+
| | CPU Frequency |||| Notification of frequency only, normally error not generated. |
63+
| HDD Monitor | HDD Temperature |||| |
64+
| | HDD PowerOnHours |||| |
65+
| | HDD TotalDataWritten |||| |
66+
| | HDD Usage |||| |
67+
| Memory Monitor | Memory Usage |||| |
68+
| Net Monitor | Network Usage |||| |
69+
| | Network CRC Error |||| Warning occurs when the number of CRC errors in the period reaches the threshold value. The number of CRC errors that occur is the same as the value that can be confirmed with the ip command. |
70+
| NTP Monitor | NTP Offset |||| |
71+
| Process Monitor | Tasks Summary |||| |
72+
| | High-load Proc[0-9] |||| |
73+
| | High-mem Proc[0-9] |||| |
74+
| GPU Monitor | GPU Temperature ||| - | |
75+
| | GPU Usage ||| - | |
76+
| | GPU Memory Usage || - | - | |
77+
| | GPU Thermal Throttling || - | - | |
78+
| | GPU Frequency | - || - | |
7879

7980
## ROS parameters
8081

system/system_monitor/config/net_monitor.param.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@
33
devices: ["*"]
44
traffic_reader_port: 7636
55
monitor_program: "greengrass"
6+
crc_error_check_duration: 1
7+
crc_error_count_threshold: 1

system/system_monitor/docs/ros_parameters.md

+6-4
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,12 @@ mem_monitor:
5353

5454
net_monitor:
5555

56-
| Name | Type | Unit | Default | Notes |
57-
| :--------- | :----------: | :-----: | :-----: | :----------------------------------------------------------------------------------- |
58-
| devices | list[string] | n/a | none | The name of network interface to monitor. (e.g. eth0, \* for all network interfaces) |
59-
| usage_warn | float | %(1e-2) | 0.95 | Generates warning when network usage reaches a specified value or higher. |
56+
| Name | Type | Unit | Default | Notes |
57+
| :------------------------ | :----------: | :-----: | :-----: | :-------------------------------------------------------------------------------------------------------------- |
58+
| devices | list[string] | n/a | none | The name of network interface to monitor. (e.g. eth0, \* for all network interfaces) |
59+
| usage_warn | float | %(1e-2) | 0.95 | Generates warning when network usage reaches a specified value or higher. |
60+
| crc_error_check_duration | int | sec | 1 | CRC error check duration. |
61+
| crc_error_count_threshold | int | n/a | 1 | Generates warning when count of CRC errors during CRC error check duration reaches a specified value or higher. |
6062

6163
## <u>NTP Monitor</u>
6264

system/system_monitor/docs/topics_net_monitor.md

+18
Original file line numberDiff line numberDiff line change
@@ -63,3 +63,21 @@
6363
| key | value (example) |
6464
| ----- | ----------------------------------------------------- |
6565
| error | [nethogs -t] execve failed: No such file or directory |
66+
67+
## <u>Network CRC Error</u>
68+
69+
/diagnostics/net_monitor: Network CRC Error
70+
71+
<b>[summary]</b>
72+
73+
| level | message |
74+
| ----- | ------- |
75+
| OK | OK |
76+
77+
<b>[values]</b>
78+
79+
| key | value (example) |
80+
| ------------------------------------------ | --------------- |
81+
| Network [0-9]: interface name | wlp82s0 |
82+
| Network [0-9]: total rx_crc_errors | 0 |
83+
| Network [0-9]: rx_crc_errors per unit time | 0 |

system/system_monitor/include/system_monitor/net_monitor/net_monitor.hpp

+101-9
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include <diagnostic_updater/diagnostic_updater.hpp>
2626

2727
#include <climits>
28+
#include <deque>
2829
#include <map>
2930
#include <string>
3031
#include <vector>
@@ -55,11 +56,6 @@ class NetMonitor : public rclcpp::Node
5556
*/
5657
~NetMonitor();
5758

58-
/**
59-
* @brief Update the diagnostic state.
60-
*/
61-
void update();
62-
6359
/**
6460
* @brief Shutdown nl80211 object
6561
*/
@@ -86,24 +82,120 @@ class NetMonitor : public rclcpp::Node
8682
void monitorTraffic(
8783
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)
8884

85+
/**
86+
* @brief check CRC error
87+
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
88+
* @note NOLINT syntax is needed since diagnostic_updater asks for a non-const reference
89+
* to pass diagnostic message updated in this function to diagnostic publish calls.
90+
*/
91+
void checkCrcError(
92+
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)
93+
8994
/**
9095
* @brief get wireless speed
9196
* @param [in] ifa_name interface name
9297
* @return wireless speed
9398
*/
9499
float getWirelessSpeed(const char * ifa_name);
95100

101+
/**
102+
* @brief timer callback
103+
*/
104+
void onTimer();
105+
106+
/**
107+
* @brief update Network information list
108+
*/
109+
void updateNetworkInfoList();
110+
111+
/**
112+
* @brief check NetMonitor General Infomation
113+
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
114+
* @return check result
115+
*/
116+
bool checkGeneralInfo(diagnostic_updater::DiagnosticStatusWrapper & stat);
117+
118+
/**
119+
* @brief Network information
120+
*/
121+
struct NetworkInfo
122+
{
123+
int mtu_errno; //!< @brief errno set by ioctl() with SIOCGIFMTU
124+
int ethtool_errno; //!< @brief errno set by ioctl() with SIOCETHTOOL
125+
bool is_running; //!< @brief resource allocated flag
126+
std::string interface_name; //!< @brief interface name
127+
float speed; //!< @brief network capacity
128+
int mtu; //!< @brief MTU
129+
float rx_traffic; //!< @brief traffic received
130+
float tx_traffic; //!< @brief traffic transmitted
131+
float rx_usage; //!< @brief network capacity usage rate received
132+
float tx_usage; //!< @brief network capacity usage rate transmitted
133+
unsigned int rx_bytes; //!< @brief total bytes received
134+
unsigned int rx_errors; //!< @brief bad packets received
135+
unsigned int tx_bytes; //!< @brief total bytes transmitted
136+
unsigned int tx_errors; //!< @brief packet transmit problems
137+
unsigned int collisions; //!< @brief number of collisions during packet transmissions
138+
139+
NetworkInfo()
140+
: mtu_errno(0),
141+
ethtool_errno(0),
142+
is_running(false),
143+
interface_name(""),
144+
speed(0.0),
145+
mtu(0),
146+
rx_traffic(0.0),
147+
tx_traffic(0.0),
148+
rx_usage(0.0),
149+
tx_usage(0.0),
150+
rx_bytes(0),
151+
rx_errors(0),
152+
tx_bytes(0),
153+
tx_errors(0),
154+
collisions(0)
155+
{
156+
}
157+
};
158+
159+
/**
160+
* @brief determine if it is a supported network
161+
* @param [in] net_info network infomation
162+
* @param [in] index index of network infomation index
163+
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
164+
* @param [out] error_str error string
165+
* @return result of determining whether it is a supported network
166+
*/
167+
bool isSupportedNetwork(
168+
const NetworkInfo & net_info, int index, diagnostic_updater::DiagnosticStatusWrapper & stat,
169+
std::string & error_str);
170+
96171
diagnostic_updater::Updater updater_; //!< @brief Updater class which advertises to /diagnostics
172+
rclcpp::TimerBase::SharedPtr timer_; //!< @brief timer to get Network information
97173

98174
char hostname_[HOST_NAME_MAX + 1]; //!< @brief host name
99175
std::map<std::string, bytes> bytes_; //!< @brief list of bytes
100176
rclcpp::Time last_update_time_; //!< @brief last update time
101177
std::vector<std::string> device_params_; //!< @brief list of devices
102-
NL80211 nl80211_; // !< @brief 802.11 netlink-based interface
178+
NL80211 nl80211_; //!< @brief 802.11 netlink-based interface
179+
int getifaddrs_errno_; //!< @brief errno set by getifaddrs()
180+
std::vector<NetworkInfo> net_info_list_; //!< @brief list of Network information
103181

104-
std::string monitor_program_; //!< @brief nethogs monitor program name
105-
bool nethogs_all_; //!< @brief nethogs result all mode
106-
int traffic_reader_port_; //!< @brief port number to connect to traffic_reader
182+
/**
183+
* @brief CRC errors information
184+
*/
185+
typedef struct crc_errors
186+
{
187+
std::deque<unsigned int> errors_queue; //!< @brief queue that holds count of CRC errors
188+
unsigned int last_rx_crc_errors; //!< @brief rx_crc_error at the time of the last monitoring
189+
190+
crc_errors() : last_rx_crc_errors(0) {}
191+
} crc_errors;
192+
std::map<std::string, crc_errors> crc_errors_; //!< @brief list of CRC errors
193+
194+
std::string monitor_program_; //!< @brief nethogs monitor program name
195+
bool nethogs_all_; //!< @brief nethogs result all mode
196+
int traffic_reader_port_; //!< @brief port number to connect to traffic_reader
197+
unsigned int crc_error_check_duration_; //!< @brief CRC error check duration
198+
unsigned int crc_error_count_threshold_; //!< @brief CRC error count threshold
107199

108200
/**
109201
* @brief Network usage status messages

0 commit comments

Comments
 (0)