[BUG] composite_terms-keyword has increased latency with Lucene 10 index #17388

expani · 2025-02-18T22:32:03Z

Describe the bug

The latency for composite_terms-keyword operation in Big5 has increased by 10-15% with the upgrade to Lucene 10. When using the index from OS 2.19 but searching using OS 3.0 server the latency is better. So, we have narrowed down the cause to something wrong with Index format or we misconfiguring/not configuring something with the Lucene 10 upgrade.

OS 3.0 * OS 2.19 indicates that the OpenSearch server was running 3.0 whereas the index used was created in OS 2.19. This was done to eliminate any suspicion of the bug arising from an indexing change in Lucene.

Metric Name	Operation/Query name	OS 2.19	OS 3.0	OS 3.0 * OS 2.19	Unit
50th percentile latency	composite_terms-keyword	385.519	437.748	363.538	ms
90th percentile latency	composite_terms-keyword	393.628	441.061	372.952	ms
99th percentile latency	composite_terms-keyword	425.658	451.375	382.752	ms
100th percentile latency	composite_terms-keyword	447.978	457.927	383.355	ms
50th percentile service time	composite_terms-keyword	384.585	436.741	362.469	ms
90th percentile service time	composite_terms-keyword	392.626	440.274	372.171	ms
99th percentile service time	composite_terms-keyword	424.494	450.533	381.558	ms
100th percentile service time	composite_terms-keyword	447.026	456.523	382.67	ms

Related component

Search:Performance

To Reproduce

Run composite_terms-keyword with Big5 workload and compare with OS 2.19

Expected behavior

composite_terms-keyword should perform the same with OS 3.0 index as it's doing with OS 2.19 index

Additional Details

Setup

Instance Type: c5.2xlarge
Allocated Heap: 1GB
Number of Nodes: 1
Shard Count: 1
Replica Count: 1

Opensearch 3.0

MetricName	Run1	Run2	Run3	Run4
50th percentile latency	366.051	379.044	370.327	322.434
90th percentile latency	368.973	383.112	373.968	325.9
99th percentile latency	371.303	387.362	379.496	328.61
100th percentile latency	372.353	387.975	389.81	332.489
50th percentile service time	364.688	377.597	369.078	320.752
90th percentile service time	367.698	381.847	372.858	324.44
99th percentile service time	369.605	386.058	378.149	327.236
100th percentile service time	371.574	386.517	388.734	331.249
Number of Segments	16	18	15	12

Opensearch 2.19

MetricName	Run1	Run2	Run3	Run4
50th percentile latency	353.721	323.328	336.394	357.056
90th percentile latency	358.518	326.532	340.155	388.582
99th percentile latency	362.467	329.777	342.727	415.51
100th percentile latency	364.568	330.178	342.888	424.402
50th percentile service time	352.316	321.542	335.087	355.832
90th percentile service time	357.086	324.94	339.169	386.827
99th percentile service time	360.725	328.378	341.498	414.117
100th percentile service time	363.381	328.608	341.628	423.306
Number of Segments	16	13	15	17

We can see that query latency of the composite_terms-keyword depends on the number of segment count. In workload I can see composite_terms-keyword queries for the 10 hrs of data. (https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/a593f0ce7099550c2ccaa65ef8d45447877e36e5/big5/operations/default.json#L767-L779)

With log_byte_size merge policy, number of segments for this date range can impact the results and number of segments can be different across different runs.

We can see results for same version itself has different numbers based on segment count. Based on this results I don't think regression on 3.0 version but the random nature of workload causing this.

@mgodwan / @backslasht Please provide your thoughts on this, in case I am missing something.

mgodwan · 2025-03-10T08:24:17Z

Thanks @dhwanilpatel. These are helpful insights.

@expani Could you confirm if the tests results you reported in the issue followed force merge before running searches to ensure a consistent benchmark setup for search?

expani · 2025-03-10T20:22:56Z

Thanks for the thorough analysis @dhwanilpatel

I had a similar observation that OS 3.0 created more segments than OS 2.19 in the initial run.

Could you confirm if the tests results you reported in the issue followed force merge before running searches to ensure a consistent benchmark setup for search?

I had forced merged OS 3.0 to 19 segments ( same as OS 2.19 had w/o force merge ) and posted the initial numbers. Also, we had used bulk_indexing_clients as 1 to ensure there is no variance because of the same.

OS 3.0 and OS 2.19.0 both had 19 segments

Luckily, I still had those indices present in my r5.xlarge instance.
So, I ran the benchmark again 3 times for OS 3.0 with OS 3.0 index and OS 2.19.0 with OS 2.19.0 index.

for i in `seq 1 3`; do opensearch-benchmark execute-test --target-hosts http://127.0.0.1:9200 --workload big5 --client-options timeout:120 --kill-running-processes --include-tasks composite_terms-keyword; done

The numbers below are from the 3rd run ( although OSB warmup takes care of JIT C2 Compiler optimisations but sometimes the number vary so recording the 3rd run where all optimisations should be done )

OS 3.0

|                                                     Store size |                         |     24.0489 |     GB |
|                                                  Translog size |                         | 5.12227e-08 |     GB |
|                                         Heap used for segments |                         |           0 |     MB |
|                                       Heap used for doc values |                         |           0 |     MB |
|                                            Heap used for terms |                         |           0 |     MB |
|                                            Heap used for norms |                         |           0 |     MB |
|                                           Heap used for points |                         |           0 |     MB |
|                                    Heap used for stored fields |                         |           0 |     MB |
|                                                  Segment count |                         |          19 |        |
|                                                 Min Throughput | composite_terms-keyword |           2 |  ops/s |
|                                                Mean Throughput | composite_terms-keyword |           2 |  ops/s |
|                                              Median Throughput | composite_terms-keyword |           2 |  ops/s |
|                                                 Max Throughput | composite_terms-keyword |           2 |  ops/s |
|                                        50th percentile latency | composite_terms-keyword |     334.184 |     ms |
|                                        90th percentile latency | composite_terms-keyword |      341.49 |     ms |
|                                        99th percentile latency | composite_terms-keyword |     364.008 |     ms |
|                                       100th percentile latency | composite_terms-keyword |     364.176 |     ms |
|                                   50th percentile service time | composite_terms-keyword |     333.226 |     ms |
|                                   90th percentile service time | composite_terms-keyword |     340.509 |     ms |
|                                   99th percentile service time | composite_terms-keyword |     362.583 |     ms |
|                                  100th percentile service time | composite_terms-keyword |     363.015 |     ms |
|                                                     error rate | composite_terms-keyword |           0 |      % |

OS 2.19.0

|                                                     Store size |                         |     24.0039 |     GB |
|                                                  Translog size |                         | 5.12227e-08 |     GB |
|                                         Heap used for segments |                         |           0 |     MB |
|                                       Heap used for doc values |                         |           0 |     MB |
|                                            Heap used for terms |                         |           0 |     MB |
|                                            Heap used for norms |                         |           0 |     MB |
|                                           Heap used for points |                         |           0 |     MB |
|                                    Heap used for stored fields |                         |           0 |     MB |
|                                                  Segment count |                         |          19 |        |
|                                                 Min Throughput | composite_terms-keyword |           2 |  ops/s |
|                                                Mean Throughput | composite_terms-keyword |           2 |  ops/s |
|                                              Median Throughput | composite_terms-keyword |           2 |  ops/s |
|                                                 Max Throughput | composite_terms-keyword |           2 |  ops/s |
|                                        50th percentile latency | composite_terms-keyword |     327.405 |     ms |
|                                        90th percentile latency | composite_terms-keyword |     336.342 |     ms |
|                                        99th percentile latency | composite_terms-keyword |     342.603 |     ms |
|                                       100th percentile latency | composite_terms-keyword |     343.608 |     ms |
|                                   50th percentile service time | composite_terms-keyword |     325.864 |     ms |
|                                   90th percentile service time | composite_terms-keyword |     334.355 |     ms |
|                                   99th percentile service time | composite_terms-keyword |     341.713 |     ms |
|                                  100th percentile service time | composite_terms-keyword |     342.731 |     ms |
|                                                     error rate | composite_terms-keyword |           0 |      % |

OS 2.19 seems to perform better than OS 3.0

OS 3.0 with OS 2.19.0 index is better

|                                                     Store size |                         |     24.0039 |     GB |
|                                                  Translog size |                         | 5.12227e-08 |     GB |
|                                         Heap used for segments |                         |           0 |     MB |
|                                       Heap used for doc values |                         |           0 |     MB |
|                                            Heap used for terms |                         |           0 |     MB |
|                                            Heap used for norms |                         |           0 |     MB |
|                                           Heap used for points |                         |           0 |     MB |
|                                    Heap used for stored fields |                         |           0 |     MB |
|                                                  Segment count |                         |          19 |        |
|                                                 Min Throughput | composite_terms-keyword |           2 |  ops/s |
|                                                Mean Throughput | composite_terms-keyword |           2 |  ops/s |
|                                              Median Throughput | composite_terms-keyword |           2 |  ops/s |
|                                                 Max Throughput | composite_terms-keyword |           2 |  ops/s |
|                                        50th percentile latency | composite_terms-keyword |     325.905 |     ms |
|                                        90th percentile latency | composite_terms-keyword |     333.594 |     ms |
|                                        99th percentile latency | composite_terms-keyword |      342.35 |     ms |
|                                       100th percentile latency | composite_terms-keyword |     347.512 |     ms |
|                                   50th percentile service time | composite_terms-keyword |      324.58 |     ms |
|                                   90th percentile service time | composite_terms-keyword |     332.423 |     ms |
|                                   99th percentile service time | composite_terms-keyword |     341.288 |     ms |
|                                  100th percentile service time | composite_terms-keyword |     345.907 |     ms |
|                                                     error rate | composite_terms-keyword |           0 |      % |

When we use OS 2.19.0 index with OS 3.0, it performs better which is the issue I had seen earlier.

The setup details are captured at the meta I have also updated the segment count there, thanks for bringing it up.

dhwanilpatel · 2025-03-11T12:27:54Z

We have observed some skweness in segment sizes with force merging to 5/10 segments across both versions. some segments become as hugh as 23gb, which might skew the results of the perf runs.

We have triggered the force merge to 1 and compared the results of 2.19/3.0. 3.0 Seems to be performing batter with forced merge to 1 segment compare to 2.19.
3.0 has latency around 240ms while 2.19 has latency around 255ms

Below are the result of the runs with one segment for composite_terms-keyword.

Opensearch 3.0

MetricName	Run1	Run2	Run3	Run4	Run5
50th percentile latency	242.761	241.351	240.904	239.761	239.28
90th percentile latency	260.406	257.365	260.409	253.7	252.469
99th percentile latency	272.212	272.393	263.501	270.534	270.86
100th percentile latency	277.806	273.662	273.142	271.934	270.962
50th percentile service time	241.76	240.005	239.269	238.51	237.977
90th percentile service time	259.013	255.846	258.993	252.852	251.001
99th percentile service time	270.599	270.486	262.097	269.388	269.682
100th percentile service time	276.174	272.351	271.382	270.676	269.775

Opensearch 2.19

MetricName	Run1	Run2	Run3	Run4	Run5
50th percentile latency	260.963	260.199	254.521	254.469	254.111
90th percentile latency	276.657	278.762	270.078	268.891	269.231
99th percentile latency	283.79	291.543	285.353	285.605	274.279
100th percentile latency	288.058	292.433	286.576	286.533	285.54
50th percentile service time	259.512	259.018	253.24	253.107	253.009
90th percentile service time	275.836	277.27	268.195	267.496	268.0
99th percentile service time	282.864	290.598	284.072	284.209	272.754
100th percentile service time	286.442	290.666	285.099	285.475	283.859

cc: @backslasht / @mgodwan / @expani

expani added bug Something isn't working untriaged labels Feb 18, 2025

github-actions bot added the Search:Performance label Feb 18, 2025

github-project-automation bot added this to Search Project Board Feb 18, 2025

github-project-automation bot moved this to 🆕 New in Search Project Board Feb 18, 2025

expani mentioned this issue Feb 18, 2025

[META] Performance Comparison of OS 3.0 Lucene 10 with Mainline #16934

Open

sandeshkr419 removed the untriaged label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] composite_terms-keyword has increased latency with Lucene 10 index #17388

[BUG] composite_terms-keyword has increased latency with Lucene 10 index #17388

expani commented Feb 18, 2025 •

edited

Loading

mgodwan commented Feb 27, 2025

dhwanilpatel commented Mar 6, 2025

dhwanilpatel commented Mar 10, 2025 •

edited

Loading

mgodwan commented Mar 10, 2025

expani commented Mar 10, 2025 •

edited

Loading

dhwanilpatel commented Mar 11, 2025

[BUG] composite_terms-keyword has increased latency with Lucene 10 index #17388

[BUG] composite_terms-keyword has increased latency with Lucene 10 index #17388

Comments

expani commented Feb 18, 2025 • edited Loading

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

mgodwan commented Feb 27, 2025

dhwanilpatel commented Mar 6, 2025

dhwanilpatel commented Mar 10, 2025 • edited Loading

Setup

Opensearch 3.0

Opensearch 2.19

mgodwan commented Mar 10, 2025

expani commented Mar 10, 2025 • edited Loading

OS 3.0

OS 2.19.0

OS 3.0 with OS 2.19.0 index is better

dhwanilpatel commented Mar 11, 2025

Opensearch 3.0

Opensearch 2.19

expani commented Feb 18, 2025 •

edited

Loading

dhwanilpatel commented Mar 10, 2025 •

edited

Loading

expani commented Mar 10, 2025 •

edited

Loading