studio usage agent: phase 1 #413

garypen · 2022-02-08T13:46:06Z

The first phase of building out usage reporting focuses purely on operation-level reporting. Nothing in this phase cares about the details of what happens inside the guts of executing an operation.

Specifically, this stage includes:

Building out the infrastructure for a Rust reporting agent — something that collects data from other parts of the process and periodically sends it to Apollo’s servers using our protobuf report data format.
Collecting some basic per-process header information (graph variant, schema ID, router version, etc)
For each executed operation, collecting this information:
- The client name, version, and reference id (just taking it from the apollographql-client-* headers at first; eventually there can be configuration for this)
- The operation’s signature (partially done, awaiting further developments in apollo-rs)
- The number of times that the operation ran
- The overall latency of the operation, bucketed using our duration histogram representation

This enables us to send Report messages to Apollo’s services where the following fields are set:

Report.header
Various fields on ReportHeader
Report.traces_per_query
Report.end_time
TracesAndStats.stats_with_context
ContextualizedStats.context
All fields on StatsContext
ContextualizedStats.query_latency_stats
QueryLatencyStats.latency_count
QueryLatencyStats.request_count

Notably, we do not include TracesAndStats.trace (detailed traces), ContextualizedStats.per_type_stat (field usage), or the fields on QueryLatencyStats relating to errors, persisted queries, caching, or the operation registry.

This enables the following Studio features:

“Operations → Performance” which shows overall request rate and whole-operation latency by operation and client. (The “error rate” graph will not function yet; we may want to disable this graph somehow if we are publicly releasing Stargate at this point.)
“Clients” shows similar data organized by client.
“Checks” uses operation history in order to determine whether a proposed schema change would break actual observed operations. (Note that the only usage data “Checks” uses is the signatures from the operations, not field usage or traces, so it will work fine after Phase 1! That said, users may be confused by the fact that “Checks” can claim an operations uses a field but “Fields” claims no fields are used.)
Datadog forwarding, experimental performance alerting, and daily Slack summaries (except for the error-related sub-features)

Studio features that will not yet function include “Fields”, “Operations → Errors”, “Operations → Traces”, and the feature of the (currently unmaintained) VSCode extension which shows per-field p95 performance data.

Note that the features that work will work with implementing services running any arbitrary GraphQL server. (This is actually equivalent to the behavior of running the existing Apollo Gateway — if you run Gateway in front of GraphQL servers that don’t implement federated tracing, the above features will function.) They will also continue to work if the federation algorithm is replaced by a more flexible Constellation consolidation algorithm that changes how execution works, because none of these features relate to the internals of execution.

garypen · 2022-02-08T14:05:39Z

resolved by: #309

garypen added the triage label Feb 8, 2022

garypen linked a pull request Feb 8, 2022 that will close this issue

initial studio agent, client awareness and operation reporting #309

Merged

garypen added enhancement An enhancement to an existing feature and removed triage labels Feb 8, 2022

garypen mentioned this issue Feb 8, 2022

initial studio agent, client awareness and operation reporting #309

Merged

garypen closed this as completed Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

studio usage agent: phase 1 #413

studio usage agent: phase 1 #413

garypen commented Feb 8, 2022

garypen commented Feb 8, 2022

studio usage agent: phase 1 #413

studio usage agent: phase 1 #413

Comments

garypen commented Feb 8, 2022

garypen commented Feb 8, 2022