Skip to content

Latest commit

 

History

History
1132 lines (845 loc) · 57.6 KB

system-metrics.md

File metadata and controls

1132 lines (845 loc) · 57.6 KB

Semantic conventions for system metrics

Status: Development

This document describes instruments and attributes for common system level metrics in OpenTelemetry. Consider the general metric semantic conventions when creating instruments not explicitly defined in the specification.

The system.* namespace SHOULD be exclusively used to report hosts' metrics. The system.* namespace SHOULD only be used when the metrics are collected from within the target system. (physical servers, virtual machines etc). Metrics collected from technology-specific, well-defined APIs (e.g. Kubelet's API or container runtimes) should be reported under their respective namespace (e.g. k8s., container.). Resource attributes related to a host, SHOULD be reported under the host.* namespace.

Warning Existing instrumentations and collector that are using v1.21.0 of this document (or prior):

  • SHOULD NOT adopt any breaking changes from document until the system semantic conventions are marked stable. Conventions include, but are not limited to, attributes, metric names, and unit of measure.
  • SHOULD introduce a control mechanism to allow users to opt-in to the new conventions once the migration plan is finalized.

General Metrics

Metric: system.uptime

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.uptime Gauge s The time the system has been running [1] Development

[1]: Instrumentations SHOULD use a gauge with type double and measure uptime in seconds as a floating point number with the highest precision available. The actual accuracy would depend on the instrumentation and operating system.

Processor Metrics

Description: System level processor metrics captured under the namespace system.cpu.

Metric: system.cpu.physical.count

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.cpu.physical.count UpDownCounter {cpu} Reports the number of actual physical processor cores on the hardware [1] Development

[1]: Calculated by multiplying the number of sockets by the number of cores per socket

Metric: system.cpu.logical.count

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.cpu.logical.count UpDownCounter {cpu} Reports the number of logical (virtual) processor cores created by the operating system to manage multitasking [1] Development

[1]: Calculated by multiplying the number of sockets by the number of cores per socket, and then by the number of threads per core

Memory Metrics

Description: System level memory metrics capture under the namespace system.memory. This does not include paging/swap memory.

Metric: system.memory.usage

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.memory.usage UpDownCounter By Reports memory in use by state. [1] Development

[1]: The sum over all system.memory.state values SHOULD equal the total memory available on the system, that is system.memory.limit.

Attribute Type Description Examples Requirement Level Stability
system.memory.state string The memory state free; cached Recommended Development

system.memory.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
buffers buffers Development
cached cached Development
free free Development
used used Development

Metric: system.memory.limit

This metric is opt-in.

Name Instrument Type Unit (UCUM) Description Stability
system.memory.limit UpDownCounter By Total memory available in the system. [1] Development

[1]: Its value SHOULD equal the sum of system.memory.state over all states.

Metric: system.memory.shared

This metric is opt-in.

Name Instrument Type Unit (UCUM) Description Stability
system.memory.shared UpDownCounter By Shared memory used (mostly by tmpfs). [1] Development

[1]: Equivalent of shared from free command or Shmem from /proc/meminfo"

Metric: system.memory.utilization

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.memory.utilization Gauge 1 Development
Attribute Type Description Examples Requirement Level Stability
system.memory.state string The memory state free; cached Recommended Development

system.memory.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
buffers buffers Development
cached cached Development
free free Development
used used Development

Paging/Swap Metrics

Description: System level paging/swap memory metrics captured under the namespace system.paging.

Metric: system.paging.usage

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.paging.usage UpDownCounter By Unix swap or windows pagefile usage Development
Attribute Type Description Examples Requirement Level Stability
system.device string Unique identifier for the device responsible for managing paging operations. /dev/dm-0 Recommended Development
system.paging.state string The memory paging state free Recommended Development

system.paging.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
free free Development
used used Development

Metric: system.paging.utilization

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.paging.utilization Gauge 1 Development
Attribute Type Description Examples Requirement Level Stability
system.device string Unique identifier for the device responsible for managing paging operations. /dev/dm-0 Recommended Development
system.paging.state string The memory paging state free Recommended Development

system.paging.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
free free Development
used used Development

Metric: system.paging.faults

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.paging.faults Counter {fault} Development
Attribute Type Description Examples Requirement Level Stability
system.paging.type string The memory paging type minor Recommended Development

system.paging.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
major major Development
minor minor Development

Metric: system.paging.operations

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.paging.operations Counter {operation} Development
Attribute Type Description Examples Requirement Level Stability
system.paging.direction string The paging access direction in Recommended Development
system.paging.type string The memory paging type minor Recommended Development

system.paging.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
in in Development
out out Development

system.paging.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
major major Development
minor minor Development

Disk Controller Metrics

Description: System level disk performance metrics captured under the namespace system.disk.

Metric: system.disk.io

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.disk.io Counter By Development
Attribute Type Description Examples Requirement Level Stability
disk.io.direction string The disk IO operation direction. read Recommended Development
system.device string The device identifier (identifier) Recommended Development

disk.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
read read Development
write write Development

Metric: system.disk.operations

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.disk.operations Counter {operation} Development
Attribute Type Description Examples Requirement Level Stability
disk.io.direction string The disk IO operation direction. read Recommended Development
system.device string The device identifier (identifier) Recommended Development

disk.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
read read Development
write write Development

Metric: system.disk.io_time

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.disk.io_time Counter s Time disk spent activated [1] Development

[1]: The real elapsed time ("wall clock") used in the I/O path (time from operations running in parallel are not counted). Measured as:

Attribute Type Description Examples Requirement Level Stability
system.device string The device identifier (identifier) Recommended Development

Metric: system.disk.operation_time

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.disk.operation_time Counter s Sum of the time each operation took to complete [1] Development

[1]: Because it is the sum of time each request took, parallel-issued requests each contribute to make the count grow. Measured as:

  • Linux: Fields 7 & 11 from procfs-diskstats
  • Windows: "Avg. Disk sec/Read" perf counter multiplied by "Disk Reads/sec" perf counter (similar for Writes)
Attribute Type Description Examples Requirement Level Stability
disk.io.direction string The disk IO operation direction. read Recommended Development
system.device string The device identifier (identifier) Recommended Development

disk.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
read read Development
write write Development

Metric: system.disk.merged

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.disk.merged Counter {operation} Development
Attribute Type Description Examples Requirement Level Stability
disk.io.direction string The disk IO operation direction. read Recommended Development
system.device string The device identifier (identifier) Recommended Development

disk.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
read read Development
write write Development

Metric: system.disk.limit

This metric is opt-in.

Name Instrument Type Unit (UCUM) Description Stability
system.disk.limit UpDownCounter By The total storage capacity of the disk Development
Attribute Type Description Examples Requirement Level Stability
system.device string The device identifier (identifier) Recommended Development

Filesystem Metrics

Description: System level filesystem metrics captured under the namespace system.filesystem.

Metric: system.filesystem.usage

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.filesystem.usage UpDownCounter By Reports a filesystem's space usage across different states. [1] Development

[1]: The sum of all system.filesystem.usage values over the different system.filesystem.state attributes SHOULD equal the total storage capacity of the filesystem, that is system.filesystem.limit.

Attribute Type Description Examples Requirement Level Stability
system.device string Identifier for the device where the filesystem resides. /dev/sda; \network-drive Recommended Development
system.filesystem.mode string The filesystem mode rw, ro Recommended Development
system.filesystem.mountpoint string The filesystem mount path /mnt/data Recommended Development
system.filesystem.state string The filesystem state used Recommended Development
system.filesystem.type string The filesystem type ext4 Recommended Development

system.filesystem.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
free free Development
reserved reserved Development
used used Development

system.filesystem.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
exfat exfat Development
ext4 ext4 Development
fat32 fat32 Development
hfsplus hfsplus Development
ntfs ntfs Development
refs refs Development

Metric: system.filesystem.utilization

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.filesystem.utilization Gauge 1 Development
Attribute Type Description Examples Requirement Level Stability
system.device string Identifier for the device where the filesystem resides. /dev/sda; \network-drive Recommended Development
system.filesystem.mode string The filesystem mode rw, ro Recommended Development
system.filesystem.mountpoint string The filesystem mount path /mnt/data Recommended Development
system.filesystem.state string The filesystem state used Recommended Development
system.filesystem.type string The filesystem type ext4 Recommended Development

system.filesystem.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
free free Development
reserved reserved Development
used used Development

system.filesystem.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
exfat exfat Development
ext4 ext4 Development
fat32 fat32 Development
hfsplus hfsplus Development
ntfs ntfs Development
refs refs Development

Metric: system.filesystem.limit

This metric is opt-in.

Name Instrument Type Unit (UCUM) Description Stability
system.filesystem.limit UpDownCounter By The total storage capacity of the filesystem Development
Attribute Type Description Examples Requirement Level Stability
system.device string Identifier for the device where the filesystem resides. /dev/sda; \network-drive Recommended Development
system.filesystem.mode string The filesystem mode rw, ro Recommended Development
system.filesystem.mountpoint string The filesystem mount path /mnt/data Recommended Development
system.filesystem.type string The filesystem type ext4 Recommended Development

system.filesystem.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
exfat exfat Development
ext4 ext4 Development
fat32 fat32 Development
hfsplus hfsplus Development
ntfs ntfs Development
refs refs Development

Network Metrics

Description: System level network metrics captured under the namespace system.network.

Metric: system.network.dropped

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.network.dropped Counter {packet} Count of packets that are dropped or discarded even though there was no error [1] Development

[1]: Measured as:

Attribute Type Description Examples Requirement Level Stability
network.interface.name string The network interface name. lo; eth0 Recommended Development
network.io.direction string The network IO operation direction. transmit Recommended Development

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
receive receive Development
transmit transmit Development

Metric: system.network.packets

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.network.packets Counter {packet} Development
Attribute Type Description Examples Requirement Level Stability
network.io.direction string The network IO operation direction. transmit Recommended Development
system.device string The device identifier (identifier) Recommended Development

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
receive receive Development
transmit transmit Development

Metric: system.network.errors

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.network.errors Counter {error} Count of network errors detected [1] Development

[1]: Measured as:

Attribute Type Description Examples Requirement Level Stability
network.interface.name string The network interface name. lo; eth0 Recommended Development
network.io.direction string The network IO operation direction. transmit Recommended Development

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
receive receive Development
transmit transmit Development

Metric: system.network.io

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.network.io Counter By Development
Attribute Type Description Examples Requirement Level Stability
network.interface.name string The network interface name. lo; eth0 Recommended Development
network.io.direction string The network IO operation direction. transmit Recommended Development

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
receive receive Development
transmit transmit Development

Metric: system.network.connections

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.network.connections UpDownCounter {connection} Development
Attribute Type Description Examples Requirement Level Stability
network.connection.state string The state of network connection [1] close_wait Recommended Development
network.interface.name string The network interface name. lo; eth0 Recommended Development
network.transport string OSI transport layer or inter-process communication method. [2] tcp; udp Recommended Stable

[1] network.connection.state: Connection states are defined as part of the rfc9293

[2] network.transport: The value SHOULD be normalized to lowercase.

Consider always setting the transport when setting a port number, since a port number is ambiguous without knowing the transport. For example different processes could be listening on TCP port 12345 and UDP port 12345.


network.connection.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
close_wait close_wait Development
closed closed Development
closing closing Development
established established Development
fin_wait_1 fin_wait_1 Development
fin_wait_2 fin_wait_2 Development
last_ack last_ack Development
listen listen Development
syn_received syn_received Development
syn_sent syn_sent Development
time_wait time_wait Development

network.transport has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
pipe Named or anonymous pipe. Stable
quic QUIC Development
tcp TCP Stable
udp UDP Stable
unix Unix domain socket Stable

Aggregate System Process Metrics

Description: System level aggregate process metrics captured under the namespace system.process. For metrics at the individual process level, see process metrics.

Metric: system.process.count

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.process.count UpDownCounter {process} Total number of processes in each state Development
Attribute Type Description Examples Requirement Level Stability
system.process.status string The process state, e.g., Linux Process State Codes running Recommended Development

system.process.status has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
defunct defunct Development
running running Development
sleeping sleeping Development
stopped stopped Development

Metric: system.process.created

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.process.created Counter {process} Total number of processes created over uptime of the host Development

system.{os}. - OS Specific System Metrics

Instrument names for system level metrics that have different and conflicting meaning across multiple OSes should be prefixed with system.{os}. and follow the hierarchies listed above for different entities like CPU, memory, and network.

For example, UNIX load average over a given interval is not well standardized and its value across different UNIX like OSes may vary despite being under similar load:

Without getting into the vagaries of every Unix-like operating system in existence, the load average more or less represents the average number of processes that are in the running (using the CPU) or runnable (waiting for the CPU) states. One notable exception exists: Linux includes processes in uninterruptible sleep states, typically waiting for some I/O activity to complete. This can markedly increase the load average on Linux systems.

(source of quote, linux source code)

An instrument for load average over 1 minute on Linux could be named system.linux.cpu.load_1m, reusing the cpu name proposed above and having an {os} prefix to split this metric across OSes.

Metric: system.linux.memory.available

Name Instrument Type Unit (UCUM) Description Stability
system.linux.memory.available UpDownCounter By An estimate of how much memory is available for starting new applications, without causing swapping [1] Development

[1]: This is an alternative to system.memory.usage metric with state=free. Linux starting from 3.14 exports "available" memory. It takes "free" memory as a baseline, and then factors in kernel-specific values. This is supposed to be more accurate than just "free" memory. For reference, see the calculations here. See also MemAvailable in /proc/meminfo.

Metric: system.linux.memory.slab.usage

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability
system.linux.memory.slab.usage UpDownCounter By Reports the memory used by the Linux kernel for managing caches of frequently used objects. [1] Development

[1]: The sum over the reclaimable and unreclaimable state values in linux.memory.slab.usage SHOULD be equal to the total slab memory available on the system. Note that the total slab memory is not constant and may vary over time. See also the Slab allocator and Slab in /proc/meminfo.

Attribute Type Description Examples Requirement Level Stability
linux.memory.slab.state string The Linux Slab memory state reclaimable; unreclaimable Recommended Development

linux.memory.slab.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
reclaimable reclaimable Development
unreclaimable unreclaimable Development