Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

studio usage agent: phase 1 #413

Closed
garypen opened this issue Feb 8, 2022 · 1 comment · Fixed by #309
Closed

studio usage agent: phase 1 #413

garypen opened this issue Feb 8, 2022 · 1 comment · Fixed by #309
Labels
enhancement An enhancement to an existing feature

Comments

@garypen
Copy link
Contributor

garypen commented Feb 8, 2022

The first phase of building out usage reporting focuses purely on operation-level reporting. Nothing in this phase cares about the details of what happens inside the guts of executing an operation.

Specifically, this stage includes:

  • Building out the infrastructure for a Rust reporting agent — something that collects data from other parts of the process and periodically sends it to Apollo’s servers using our protobuf report data format.
  • Collecting some basic per-process header information (graph variant, schema ID, router version, etc)
  • For each executed operation, collecting this information:
    • The client name, version, and reference id (just taking it from the apollographql-client-* headers at first; eventually there can be configuration for this)
    • The operation’s signature (partially done, awaiting further developments in apollo-rs)
    • The number of times that the operation ran
    • The overall latency of the operation, bucketed using our duration histogram representation

This enables us to send Report messages to Apollo’s services where the following fields are set:

  • Report.header
  • Various fields on ReportHeader
  • Report.traces_per_query
  • Report.end_time
  • TracesAndStats.stats_with_context
  • ContextualizedStats.context
  • All fields on StatsContext
  • ContextualizedStats.query_latency_stats
  • QueryLatencyStats.latency_count
  • QueryLatencyStats.request_count

Notably, we do not include TracesAndStats.trace (detailed traces), ContextualizedStats.per_type_stat (field usage), or the fields on QueryLatencyStats relating to errors, persisted queries, caching, or the operation registry.

This enables the following Studio features:

  • “Operations → Performance” which shows overall request rate and whole-operation latency by operation and client. (The “error rate” graph will not function yet; we may want to disable this graph somehow if we are publicly releasing Stargate at this point.)
  • “Clients” shows similar data organized by client.
  • “Checks” uses operation history in order to determine whether a proposed schema change would break actual observed operations. (Note that the only usage data “Checks” uses is the signatures from the operations, not field usage or traces, so it will work fine after Phase 1! That said, users may be confused by the fact that “Checks” can claim an operations uses a field but “Fields” claims no fields are used.)
  • Datadog forwarding, experimental performance alerting, and daily Slack summaries (except for the error-related sub-features)

Studio features that will not yet function include “Fields”, “Operations → Errors”, “Operations → Traces”, and the feature of the (currently unmaintained) VSCode extension which shows per-field p95 performance data.

Note that the features that work will work with implementing services running any arbitrary GraphQL server. (This is actually equivalent to the behavior of running the existing Apollo Gateway — if you run Gateway in front of GraphQL servers that don’t implement federated tracing, the above features will function.) They will also continue to work if the federation algorithm is replaced by a more flexible Constellation consolidation algorithm that changes how execution works, because none of these features relate to the internals of execution.

@garypen garypen added the triage label Feb 8, 2022
@garypen garypen linked a pull request Feb 8, 2022 that will close this issue
@garypen garypen added enhancement An enhancement to an existing feature and removed triage labels Feb 8, 2022
@garypen
Copy link
Contributor Author

garypen commented Feb 8, 2022

resolved by: #309

@garypen garypen closed this as completed Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant