Consider Adding Cache Layer to Reduce Requests to Trino and Improve Response Time #606

digarcia78 · 2025-01-31T19:32:34Z

We’ve noticed that multiple requests are being made to Trino, which can sometimes result in slow response times due to the need to recalculate results repeatedly. To improve performance, we would like to suggest implementing a cache layer to store intermediate results and avoid unnecessary requests to Trino for the same data.

Would love to hear if this is something that could be implemented or if there are any existing solutions in place for this. Do you have any recommendations or suggestions on how we might approach this?

xkrogen · 2025-01-31T22:01:38Z

Actually just two weeks ago I built something like this into the Gateway as a hackday project :) I think it's a great idea -- at LinkedIn we see the same situation as what you described, where there are many instances of duplicate queries doing redundant computation. Dashboarding solutions hooked up to Trino definitely cause this -- maybe a bunch of different people are loading the same dashboard, or you've edited one chart on a dashboard and when it refreshes it reloads all of the charts even though most of them haven't changed.

Given that Trino queries typically have very small output resultset sizes, the cache should be able to be fairly small. In my POC, I stored the results into the same DB that the GW uses for all of its other operational data. We could consider doing this only for queries with very small outputs, and use a different, more scalable storage for queries with larger result sets. One area I'm also super interested in is using the new spooling client protocol to implement this cache in a zero-copy manner, at least for clients which leverage the spooled protocol.

I think the hardest part of this is probably determining how long the cache the results for and when to invalidate the cache. In my POC I just had a fixed TTL and use the exact query text as the cache key. But I think to make this work well, we'd have to add much smarter invalidation logic. We could look for nondeterministic functions (e.g. random) or time-based functions (e.g. current_time or current_date) and treat those differently. For data sources which support it, such as Iceberg, we could use a mechanism like snapshot IDs to determine if the input data sources have changed and then invalidate the cache. Lots of interesting stuff that can be explored here!

As an even further extension, what really gets me excited is that once we're persisting the query results, we could use that to implement query-level retries in the Gateway, pulling the same functionality as Trino FTE's QUERY retry policy up to the Gateway layer to allow it to recover even from coordinator/cluster failures and do retries across clusters (including potentially retrying a query on a "larger" or differently tuned cluster).

RoeyoOgen · 2025-02-12T19:54:04Z

@xkrogen just out of curiosity can you possibly share the code?
@digarcia78 I too think this will be an awesome feature to implement. Would love to be a part of it if you need some help.

xkrogen · 2025-02-14T23:21:40Z

I think I should be able to share a branch, though warning that is veryyyy far from being production-ready :) I will try to share next week.

RoeyoOgen · 2025-02-15T18:31:04Z

@xkrogen thanks!

I was thinking about implementing this as part of the routing rules engine (before stumbling upon this issue). Would it be acceptable to implement this as part of the actual gateway service?
Maybe @mosabua can give an opinion ?

Was also leaning towards using Redis as the caching db...

willmostly · 2025-02-18T18:57:15Z

I would prefer implementing caching at Trino and making the gateway cache-aware to having the gateway implement caching directly. This would have the advantage of supporting data caching (like https://trino.io/docs/current/object-storage/file-system-cache.html) and subquery caching as well as result set caching. This paper has some interesting ideas: https://storage.googleapis.com/gweb-research2023-media/pubtools/5018.pdf

Caching at the gateway layer introduces security concerns that we do not have today.

Minimally, the gateway would need to authenticate users instead of delegating this responsibility to Trino
A multi-user shared cache would require authorization, and authorization requires analyzing the query to implement view security. Analysis requires connectors - at this point we've almost turned the gateway into a trino coordinator. If you were to do all this, you would also need to ensure consistency between the authorization rules at the gateway and all of its backend clusters. This would not be an issue if the gateway were cache-aware

xkrogen · 2025-02-21T16:35:40Z

Minimally, the gateway would need to authenticate users instead of delegating this responsibility to Trino

Agreed, though doing auth in the GW seems reasonable.

A multi-user shared cache would require authorization, ...

Yes, agreed that a shared cache introduces a lot of complexity. In my POC I have limited the cache in scope to be per-user to sidestep such issues; a user can only access their own cached results. I'm sure other environments will vary, but I found that in our environment this actually only hurts the cache efficacy/hit-rate by a few %.

I would prefer implementing caching at Trino and making the gateway cache-aware to having the gateway implement caching directly. ...

I would agree with you except for this point I called out in my earlier comment:

As an even further extension, what really gets me excited is that once we're persisting the query results, we could use that to implement query-level retries in the Gateway, pulling the same functionality as Trino FTE's QUERY retry policy up to the Gateway layer to allow it to recover even from coordinator/cluster failures and do retries across clusters (including potentially retrying a query on a "larger" or differently tuned cluster).

This one is really the motivating factor for me -- QUERY-level retries in the GW is my real goal, which requires persisting/spooling results, and that was what gave me the idea that if you're persisting these results anyway, you may as well build a cache on top of it as well. Caching (at least scoped to a user) is much simpler than the retries, so I started there.

mosabua · 2025-02-21T17:23:44Z

We discussed this in the dev sync as well. I strongly agree with @xkrogen .. per user cache will already solve a lot of dashboard and service user refresh scenarios and is worth doing just for that use case only. Roey will write up more details and potentially work in PoC .. but we should totally collaborate on that.

At the same time we do need to be aware of the security issues. We can probably grow functionality of it all over time.

And lastly ... we know from trino-lb that issuing query id in the lb, holding on to user while clusters scale up, or start up, and holding on to the request until then are ready work .. we eventually should get there and we have the db backend to store the queue so we should be able to move towards that as well.

xkrogen · 2025-02-21T19:45:01Z

@RoeyoOgen here's my branch. Lots of room for improvement!! https://github.com/xkrogen/trino-gateway/tree/xkrogen/resultset-caching-poc

mosabua · 2025-02-21T19:47:45Z

Before I forget .. we should use an open source db for caching - so probably valkey rather than redis

RoeyoOgen · 2025-02-22T18:00:26Z

@RoeyoOgen here's my branch. Lots of room for improvement!! https://github.com/xkrogen/trino-gateway/tree/xkrogen/resultset-caching-poc

Thanks!

Taking all your comments on board and will start to tinker with a design on my part :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider Adding Cache Layer to Reduce Requests to Trino and Improve Response Time #606

Consider Adding Cache Layer to Reduce Requests to Trino and Improve Response Time #606

digarcia78 commented Jan 31, 2025

xkrogen commented Jan 31, 2025

RoeyoOgen commented Feb 12, 2025

xkrogen commented Feb 14, 2025

RoeyoOgen commented Feb 15, 2025 •

edited

Loading

willmostly commented Feb 18, 2025

xkrogen commented Feb 21, 2025

mosabua commented Feb 21, 2025

xkrogen commented Feb 21, 2025

mosabua commented Feb 21, 2025

RoeyoOgen commented Feb 22, 2025

Consider Adding Cache Layer to Reduce Requests to Trino and Improve Response Time #606

Consider Adding Cache Layer to Reduce Requests to Trino and Improve Response Time #606

Comments

digarcia78 commented Jan 31, 2025

xkrogen commented Jan 31, 2025

RoeyoOgen commented Feb 12, 2025

xkrogen commented Feb 14, 2025

RoeyoOgen commented Feb 15, 2025 • edited Loading

willmostly commented Feb 18, 2025

xkrogen commented Feb 21, 2025

mosabua commented Feb 21, 2025

xkrogen commented Feb 21, 2025

mosabua commented Feb 21, 2025

RoeyoOgen commented Feb 22, 2025

RoeyoOgen commented Feb 15, 2025 •

edited

Loading