Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Optimizing Resource Usage Header Performance for Search Requests #17407

Open
ansjcy opened this issue Feb 20, 2025 · 1 comment
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Insights

Comments

@ansjcy
Copy link
Member

ansjcy commented Feb 20, 2025

Is your feature request related to a problem? Please describe

Query-level resource usage tracking was introduced as part of #12399 and implemented in PR #13172. Following discussions with the community, we opted to piggyback shard-level resource usage data in the response header to be aggregated on the coordinator node. We did some benchmarks last year and didn't see big resource usage impact for normal search workloads. However, recently, we have started to see some challenges:

  • As mentioned in [BUG] TaskResourceTrackingService consuming more CPU than expected #16635, although the cpu usage overhead to do serialization/deserialization wouldn't be noticeable in most normal workloads, profiling with minimal queries has shown an approximate 7% increase in CPU usage due to serialization and deserialization overhead.
  • When reindexing huge amounts of shards, there will be a huge number of frequent short-lived scroll requests being sent. In this case, the resource tracking updates accumulate quickly and can consume a non-trivial amount of resources.

We need to think about how to enhance the performance of the resource usage header injection.

Describe the solution you'd like

I propose we do the following items to improve the performance of resource usage headers:

  1. Replace JSON-based headers with a lightweight, delimited string format like: <action>,<taskid>,<parentid>,<nodeid>,<cpu>,<memory usage>. We should also refactor the transport protocol to allow writing binary header values to further speed up.
  2. We can also introduce a configurable sampling mechanism to limit resource tracking to a certain percentage of shards if the number of shards exceeds a certain threshold, this will reduce overhead while maintaining representative insights.
  3. Only include resource usage in the response header when usage exceeds a configurable threshold.
  4. Move the logic to deserialize/parse the resource usage data into an async flow in Query Insights, instead of processing it within the search execution path, to reduce query latency.

Related component

Search:Query Insights

Describe alternatives you've considered

N/A

Additional context

#11522
#12399

@ansjcy ansjcy added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 20, 2025
@ansjcy ansjcy self-assigned this Feb 21, 2025
@sgup432
Copy link
Contributor

sgup432 commented Feb 21, 2025

Move the logic to deserialize/parse the resource usage data into an async flow in Query Insights, instead of processing it within the search execution path, to reduce query latency.

I believe this might be a better way in the long run.

I recall that we had considered storing shard-level insights data at the data node level along with the taskId as one of the approaches. The coordinator node would then aggregate it on demand by retrieving the relevant taskId data from the nodes. The taskId parent-child relationship could reside within the coordinator node.

This approach would make fetching insights data relatively more expensive since it wouldn’t be precomputed. However, with the right optimizations, it should be acceptable to users, as these cases are not latency-sensitive for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Insights
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants