[Feature Request] Optimizing Resource Usage Header Performance for Search Requests #17407

ansjcy · 2025-02-20T23:56:55Z

Is your feature request related to a problem? Please describe

Query-level resource usage tracking was introduced as part of #12399 and implemented in PR #13172. Following discussions with the community, we opted to piggyback shard-level resource usage data in the response header to be aggregated on the coordinator node. We did some benchmarks last year and didn't see big resource usage impact for normal search workloads. However, recently, we have started to see some challenges:

As mentioned in [BUG] TaskResourceTrackingService consuming more CPU than expected #16635, although the cpu usage overhead to do serialization/deserialization wouldn't be noticeable in most normal workloads, profiling with minimal queries has shown an approximate 7% increase in CPU usage due to serialization and deserialization overhead.
When reindexing huge amounts of shards, there will be a huge number of frequent short-lived scroll requests being sent. In this case, the resource tracking updates accumulate quickly and can consume a non-trivial amount of resources.

We need to think about how to enhance the performance of the resource usage header injection.

Describe the solution you'd like

I propose we do the following items to improve the performance of resource usage headers:

Replace JSON-based headers with a lightweight, delimited string format like: <action>,<taskid>,<parentid>,<nodeid>,<cpu>,<memory usage>. We should also refactor the transport protocol to allow writing binary header values to further speed up.
We can also introduce a configurable sampling mechanism to limit resource tracking to a certain percentage of shards if the number of shards exceeds a certain threshold, this will reduce overhead while maintaining representative insights.
Only include resource usage in the response header when usage exceeds a configurable threshold.
Move the logic to deserialize/parse the resource usage data into an async flow in Query Insights, instead of processing it within the search execution path, to reduce query latency.

Related component

Search:Query Insights

Describe alternatives you've considered

N/A

Additional context

#11522
#12399

The text was updated successfully, but these errors were encountered:

sgup432 · 2025-02-21T19:06:20Z

Move the logic to deserialize/parse the resource usage data into an async flow in Query Insights, instead of processing it within the search execution path, to reduce query latency.

I believe this might be a better way in the long run.

I recall that we had considered storing shard-level insights data at the data node level along with the taskId as one of the approaches. The coordinator node would then aggregate it on demand by retrieving the relevant taskId data from the nodes. The taskId parent-child relationship could reside within the coordinator node.

This approach would make fetching insights data relatively more expensive since it wouldn’t be precomputed. However, with the right optimizations, it should be acceptable to users, as these cases are not latency-sensitive for them.

ansjcy added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 20, 2025

github-actions bot added the Search:Query Insights label Feb 20, 2025

github-project-automation bot added this to Search Project Board Feb 20, 2025

github-project-automation bot moved this to 🆕 New in Search Project Board Feb 20, 2025

ansjcy self-assigned this Feb 21, 2025

ansjcy mentioned this issue Feb 21, 2025

[META] Generic Query Insights Framework #11522

Open

33 tasks

getsaurabh02 removed the untriaged label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Optimizing Resource Usage Header Performance for Search Requests #17407

[Feature Request] Optimizing Resource Usage Header Performance for Search Requests #17407

ansjcy commented Feb 20, 2025 •

edited

Loading

sgup432 commented Feb 21, 2025

[Feature Request] Optimizing Resource Usage Header Performance for Search Requests #17407

[Feature Request] Optimizing Resource Usage Header Performance for Search Requests #17407

Comments

ansjcy commented Feb 20, 2025 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

sgup432 commented Feb 21, 2025

ansjcy commented Feb 20, 2025 •

edited

Loading