You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a date dimension parameter as part of star-tree index mapping. As part of date dimension, the users can specify up to 3 calendar intervals.
These calendar interval by definition and logic are same as the calendar intervals present in date histogram aggregator.
Since the cardinality of the date field is quite high, we use the calendar intervals to round the date as per the calendar interval.
The defaults are half_hour and minute.
Since we round off the original timestamp with the specified calendar intervals , there will be results from star tree which are different from original query.
Sample Dataset
For example, lets say you have the following timestamps in your documents.
From the sample data-set and how star-tree aggregated the data, its evident that half-open intervals can be supported natively with 100% accuracy given that the relevant dimension interval exists.
gte: (greater than or equal to) for start time lt: (less than) for end time
Why [gte, lt) instead of [gte, lte] or anything else?
Its because of buckets are also constructed in the exact same fashion. Like DocId 4(2023-09-15 15:00:00.000) above, will not be part of hour star-tree bucket [14,15).
Unsupported Query Shapes
The following query shapes cannot be supported with star-tree. We will have to revert to existing search flow to resolve the query.
Exact Match Queries - since original timestamps are not retained.
"timestamp": "2023-09-15 14:23:45.789"
Same Start/End Time - this is same as exact match query
Dimension misaligned precision (for example, querying minute precision, when star tree only has hour dimension as most granular precision - we will have to revert to existing non star-tree flow.
Note: we have rounded up 14:23:45.789 to 14:24:00.000 as the open interval with 14:24:00.000 captures the data points accurately to now.
We needed now-1h to now in the above query and we decided that rounding off to granularity of next granular dimension to hour (in query) which is minute might be a good approximation. Now how do we decide what approximation interval is more accurate is still undecided for now. We could potentially use the 2nd relative granular interval which is second as well, but that will increase the query time for sure.
[Need ideas]: So one concern is deciding which granularity to use to approximate the results.
Related component
Search:Aggregations
Describe alternatives you've considered
For Relative Time with [gte,lt) case above, one food for thought is to pass on another parameter to decide on approximation granularity.
In that way, we only round off or approx results when the approx parameter is passed in the query. In other cases, we resolve accurately without using star-tree.
Since we are not tightening the new parameter with star-tree, the approx parameter would behave the same in both cases. The resolution from relative time to absolute time would remain the same irrespective of whether to use star-tree or not.
Additional context
No response
The text was updated successfully, but these errors were encountered:
In that way, we only round off or approx results when the approx parameter is passed in the query. In other cases, we resolve accurately without using star-tree.
Just trying to understand in which cases without star tree will this be useful , since anyways with approx or no approx , query latency will be same with BKD for instance.
Is your feature request related to a problem? Please describe
Meta - #15257
We have a date dimension parameter as part of star-tree index mapping. As part of date dimension, the users can specify up to 3 calendar intervals.
These calendar interval by definition and logic are same as the calendar intervals present in date histogram aggregator.
Since the cardinality of the date field is quite high, we use the calendar intervals to round the date as per the calendar interval.
The defaults are
half_hour
andminute
.Since we round off the original timestamp with the specified calendar intervals , there will be results from star tree which are different from original query.
Sample Dataset
For example, lets say you have the following timestamps in your documents.
Star-Tree Data
Here is how your different intervals will look like in your star-tree:
Note:
DocIds
mentioned below are not retained in star-tree, since the data is aggregated and stored. Just noted for clarificationhour
dimension intervalhalf_hour
dimension intervalquater_hour
dimension intervalminute
dimension intervalDescribe the solution you'd like
Supported Query Shape
From the sample data-set and how star-tree aggregated the data, its evident that half-open intervals can be supported natively with 100% accuracy given that the relevant dimension interval exists.
For example, queries like:
gte
: (greater than or equal to) for start timelt
: (less than) for end timeWhy [gte, lt) instead of [gte, lte] or anything else?
Its because of buckets are also constructed in the exact same fashion. Like DocId 4(2023-09-15 15:00:00.000) above, will not be part of hour star-tree bucket [14,15).
Unsupported Query Shapes
The following query shapes cannot be supported with star-tree. We will have to revert to existing search flow to resolve the query.
Approximately supported Case
Now resolving relative time is tricky.
For example:
Query:
so it gets resolved as:
Now, with some approximation (this 'some approximation' is very vague for now), we can potentially approximate the above resolution to:
Note: we have rounded up 14:23:45.789 to 14:24:00.000 as the open interval with 14:24:00.000 captures the data points accurately to
now
.We needed
now-1h
tonow
in the above query and we decided that rounding off to granularity of next granular dimension tohour
(in query) which isminute
might be a good approximation. Now how do we decide what approximation interval is more accurate is still undecided for now. We could potentially use the 2nd relative granular interval which is second as well, but that will increase the query time for sure.[Need ideas]: So one concern is deciding which granularity to use to approximate the results.
Related component
Search:Aggregations
Describe alternatives you've considered
For
Relative Time with [gte,lt)
case above, one food for thought is to pass on another parameter to decide on approximation granularity.For example:
resolves to:
while
resolves to:
In that way, we only round off or approx results when the
approx
parameter is passed in the query. In other cases, we resolve accurately without using star-tree.Since we are not tightening the new parameter with star-tree, the
approx
parameter would behave the same in both cases. The resolution from relative time to absolute time would remain the same irrespective of whether to use star-tree or not.Additional context
No response
The text was updated successfully, but these errors were encountered: