You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: velox/docs/develop/debugging/tracing.rst
+198-25
Original file line number
Diff line number
Diff line change
@@ -59,30 +59,135 @@ There are three types of writers: `TaskTraceMetadataWriter`, `OperatorTraceInput
59
59
and `OperatorTraceSplitWriter`. They are used in the prod or shadow environment to record
60
60
the real execution data.
61
61
62
-
**TaskTraceMetadataWriter** records the query metadata during task creation,
63
-
serializes, and writes them into a file in JSON format. There are two kinds
64
-
of metadata:
62
+
**TaskTraceMetadataWriter**
63
+
64
+
The `TaskTraceMetadataWriter` records the query metadata during task creation, serializes it,
65
+
and saves it into a file in JSON format. There are two types of metadata:
66
+
67
+
1. **Query Configurations and Connector Properties**: These are user-specified per query and can
68
+
be serialized as JSON map objects (key-value pairs).
69
+
2. **Task Plan Fragment** (aka Plan Node Tree): This can be serialized as a JSON object, a feature
70
+
already supported in Velox (see `#4614 <https://github.com/facebookincubator/velox/issues/4614>`_, `#4301 <https://github.com/facebookincubator/velox/issues/4301>`_, and `#4398 <https://github.com/facebookincubator/velox/issues/4398>`_).
71
+
72
+
The metadata is saved as a single JSON object string in the metadata file. It would look similar
73
+
to the following simplified, pretty-printed JSON string (with some content removed for brevity):
74
+
75
+
.. code-block:: JSON
76
+
77
+
{
78
+
"planNode":{
79
+
"nullAware": false,
80
+
"outputType":{...},
81
+
"leftKeys":[...],
82
+
"rightKeys":[...],
83
+
"joinType":"INNER",
84
+
"sources":[
85
+
{
86
+
"outputType":{...},
87
+
"tableHandle":{...},
88
+
"assignments":[...],
89
+
"id":"0",
90
+
"name":"TableScanNode"
91
+
},
92
+
{
93
+
"outputType":{...},
94
+
"tableHandle":{...},
95
+
"assignments":[...],
96
+
"id":"1",
97
+
"name":"TableScanNode"
98
+
}
99
+
],
100
+
"id":"2",
101
+
"name":"HashJoinNode"
102
+
},
103
+
"connectorProperties":{...},
104
+
"queryConfig":{"query_trace_node_ids":"2", ...}
105
+
}
106
+
107
+
**OperatorTraceInputWriter**
108
+
109
+
The `OperatorTraceInputWriter` records the input vectors from the target operator, it uses a Presto
110
+
serializer to serialize each vector batch and flush immediately to ensure that replay is possible
111
+
even if a crash occurs during execution.
65
112
66
-
- Query configurations and connector properties are specified by the user per query.
67
-
They can be serialized as JSON map objects (key-value pairs).
68
-
- Plan fragment of the task (also known as a plan node tree). It can be serialized
69
-
as a JSON object, which is already supported in Velox.
113
+
It is created during the target operator's initialization and writes data in the `Operator::addInput`
114
+
method during execution. It finishes when the target operator is closed. However, it can finish early
115
+
if the recorded data size exceeds the limit specified by the user.
70
116
71
-
**OperatorTraceInputWriter** records the input vectors from the target operator, it uses a Presto
72
-
serializer to serialize each vector batch and flush immediately to ensure that replay is possible
73
-
even if a crash occurs during execution. It is created during the target operator's initialization
74
-
and writes data in the `Operator::addInput` method during execution. It finishes when the target
75
-
operator is closed. However, it can finish early if the recorded data size exceeds the limit specified
76
-
by the user.
117
+
**OperatorTraceSplitWriter**
77
118
78
-
**OperatorTraceSplitWriter** captures the input splits from the target `TableScan` operator. It
119
+
The `OperatorTraceSplitWriter` captures the input splits from the target `TableScan` operator. It
79
120
serializes each split and immediately flushes it to ensure that replay is possible even if a crash
80
-
occurs during execution. Each split is serialized as follows:
0 commit comments