-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] desc_sort_timestamp is slower with OS 3.0 index compared to OS 2.19 index on OS 3.0 server #17404
Comments
@prudhvigodithi Created an issue for regression with |
@expani Is the OS 3.0 server running the code in current main? Could you please share the commit as well? |
Also, to rule out whether the regression is due to OpenSearch integration with Lucene vs certain lucene changes, have you run similar benchmarks directly against Lucene? |
Thanks for taking a look @mgodwan Yes it was mainline commit post reta's PR for Lucene 10 Upgrade along with fix for another regression we found #17329 This is the branch I used in my fork which contain the same changes. The setup details are mentioned in the META #16934 (comment)
That's a good suggestion to try out FYI @prudhvigodithi |
|
Metric | Min | Mean | Median | Max | Unit |
---|---|---|---|---|---|
Throughput | 2.0065 | 2.0079 | 2.0078 | 2.0097 | ops/s |
Metric | 50.0% | 90.0% | 99.0% | 100.0% | Mean | Unit |
---|---|---|---|---|---|---|
Latency | 7.7993 | 8.4818 | 9.3014 | 11.3231 | 7.8253 | ms |
Service Time | 6.4602 | 6.8806 | 8.1644 | 10.0046 | 6.6034 | ms |
Client Processing Time | 0.2553 | 0.2681 | 0.2870 | 0.3106 | 0.2582 | ms |
Processing Time | 6.8037 | 7.2246 | 8.5087 | 10.3893 | 6.9520 | ms |
Raw JSON Data
{
"benchmark-version": "1.11.0",
"benchmark-revision": null,
"environment": "local",
"test-execution-id": "52475486-32f5-44b0-b979-63b1a112334c",
"test-execution-timestamp": "20250226T054626Z",
"pipeline": "benchmark-only",
"user-tags": {},
"workload": "big5",
"provision-config-instance": [
"external"
],
"cluster": {
"revision": "21a924d450b567eaba0cd79159a15efbd6e24382",
"distribution-version": "3.0.0-SNAPSHOT",
"distribution-flavor": "oss",
"provision-config-revision": null
},
"results": {
"op_metrics": [
{
"task": "desc_sort_timestamp",
"operation": "desc_sort_timestamp",
"throughput": {
"min": 2.0064874647333277,
"mean": 2.007877608788571,
"median": 2.0077727333078834,
"max": 2.0096717486568525,
"unit": "ops/s"
},
"latency": {
"50_0": 7.799304999934975,
"90_0": 8.481775302061578,
"99_0": 9.301373549242282,
"100_0": 11.323106999043375,
"mean": 7.825337960712204,
"unit": "ms"
},
"service_time": {
"50_0": 6.460237498686183,
"90_0": 6.880583902238868,
"99_0": 8.16442375002226,
"100_0": 10.004610998294083,
"mean": 6.60335849996045,
"unit": "ms"
},
"client_processing_time": {
"50_0": 0.255280499914079,
"90_0": 0.26806040077644866,
"99_0": 0.2869502815156012,
"100_0": 0.31063899950822815,
"mean": 0.2581927903884207,
"unit": "ms"
},
"processing_time": {
"50_0": 6.8036605007364415,
"90_0": 7.224638901971048,
"99_0": 8.508656228332265,
"100_0": 10.389283001131844,
"mean": 6.952017930125294,
"unit": "ms"
},
"error_rate": 0.0,
"duration": 149508.94947399865
}
],
"correctness_metrics": [
{
"task": "desc_sort_timestamp",
"operation": "desc_sort_timestamp",
"recall@k": {},
"recall@1": {},
"error_rate": 0.0,
"duration": 149508.94947399865
}
],
"total_time": 8106590,
"total_time_per_shard": {
"min": 8106590,
"median": 8106590,
"max": 8106590,
"unit": "ms"
},
"indexing_throttle_time": 0,
"indexing_throttle_time_per_shard": {
"min": 0,
"median": 0,
"max": 0,
"unit": "ms"
},
"merge_time": 12739943,
"merge_time_per_shard": {
"min": 12739943,
"median": 12739943,
"max": 12739943,
"unit": "ms"
},
"merge_count": 998,
"refresh_time": 1118650,
"refresh_time_per_shard": {
"min": 1118650,
"median": 1118650,
"max": 1118650,
"unit": "ms"
},
"refresh_count": 8992,
"flush_time": 473326,
"flush_time_per_shard": {
"min": 473326,
"median": 473326,
"max": 473326,
"unit": "ms"
},
"flush_count": 18,
"merge_throttle_time": 6119921,
"merge_throttle_time_per_shard": {
"min": 6119921,
"median": 6119921,
"max": 6119921,
"unit": "ms"
},
"ml_processing_time": [],
"young_gc_time": 0,
"young_gc_count": 0,
"old_gc_time": 0,
"old_gc_count": 0,
"memory_segments": 0,
"memory_doc_values": 0,
"memory_terms": 0,
"memory_norms": 0,
"memory_points": 0,
"memory_stored_fields": 0,
"store_size": 25699222076,
"translog_size": 55,
"segment_count": 30,
"total_transform_search_times": [],
"total_transform_index_times": [],
"total_transform_processing_times": [],
"total_transform_throughput": []
},
"workload-revision": "e0831c4",
"test_procedure": "big5",
"workload-params": {
"number_of_shards": 1,
"bulk_indexing_clients": 1,
"number_of_replicas": 0
}
}
Run in progress for single node cluster.
This may have something to do with segment topology i.e. either how segments are order, and traversed for queries. Or the other culprit can be the merge policy here. We can explore these 2 directions to start with and understand if they are impacting the performance here. |
Raw JSON Data for OS 3.0.0 Single node Before force merge of Segments
Ran Raw JSON Data for OS 3.0.0 Single node After force merge of Segments
|
Coming from #17404 (comment) both the 2.19 and 3.0 uses |
Summary of Observations:Regression Results in 3.0.0:
Regression in 2.19.1:
2.19.1 Commit-Based Testing:
Results
|
Coming from my observations here #17404 (comment), While I debug more @mgodwan can you please check to see if there is any issue/changes with the indexing side thats causing the regression? In multiple tests using the index data directory that showed no regression on to 3.0.0 and 2.19.1 shows no regression. I'm surprised to see this even for 2.19.1, can you please check and see if you are seeing the results? Thanks |
Describe the bug
OS 3.0 * OS 2.19 indicates that the OpenSearch server was running 3.0 whereas the index used was created in OS 2.19. This was done to eliminate any suspicion of the bug arising from an indexing change in Lucene.
As we can see something in the index layout is causing OS 3.0 server to be slower with OS 3.0 index.
Related component
Search:Performance
To Reproduce
Run
desc_sort_timestamp
with Big5 workload using OS 3.0 server and compare with indices created in OS 2.19 v/s OS 3.0Expected behavior
desc_sort_timestamp
should be equally faster with OS 3.0 index.Additional Details
Meta
The text was updated successfully, but these errors were encountered: