[BUG] Elastic 7.10 with LTR Plugin: Excessive Increase in P95 Latency When Running Multiple A/B Test Variants #12648

wli-chwy · 2024-03-13T21:29:23Z

Describe the bug

When running A/B tests using Elastic 7.10 with the LTR plugin, I encounter a significant increase in P95 latency as the number of variants in the test increases. With 5 different but slightly varied queries and corresponding models that share the same feature set, executing 2 variants in interleaved calls at 100 RPS (Requests Per Second) maintains a P95 latency of around 400ms. However, when executing all 5 variants together under the same conditions, the P95 latency escalates to 2 seconds.

Related component

Plugins

To Reproduce

Set up Elastic 7.10 with the LTR plugin.
Configure 5 variants with slightly different queries and models, ensuring all models use the same feature set.
Run a test with any 2 of these variants in interleaved calls at 100 RPS. Observe the P95 latency.
Run a test with all 5 variants together at 100 RPS in interleaved calls. Observe the P95 latency.

Expected behavior

The expectation is that increasing the number of variants from 2 to 5 should not drastically increase the P95 latency, especially from 400ms to 2 seconds. A slight increase in latency might be anticipated due to the higher computational demand, but the scale of the increase observed is unexpected and indicates a potential performance issue.

Additional Details

Elastic 7.10 with LTR plugin 1.5.5-es7.10.2 https://github.com/o19s/elasticsearch-learning-to-rank

dblock · 2024-03-15T13:35:30Z

@wli-chwy Elasticsearch is https://github.com/elastic/elasticsearch, please open issues related to that product there.

OpenSearch has a fork of LTR, see https://github.com/opensearch-project/opensearch-learning-to-rank-base. This may also be of interest.

wli-chwy added bug Something isn't working untriaged labels Mar 13, 2024

github-actions bot added the Plugins label Mar 13, 2024

dblock closed this as completed Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Elastic 7.10 with LTR Plugin: Excessive Increase in P95 Latency When Running Multiple A/B Test Variants #12648

[BUG] Elastic 7.10 with LTR Plugin: Excessive Increase in P95 Latency When Running Multiple A/B Test Variants #12648

wli-chwy commented Mar 13, 2024

dblock commented Mar 15, 2024

[BUG] Elastic 7.10 with LTR Plugin: Excessive Increase in P95 Latency When Running Multiple A/B Test Variants #12648

[BUG] Elastic 7.10 with LTR Plugin: Excessive Increase in P95 Latency When Running Multiple A/B Test Variants #12648

Comments

wli-chwy commented Mar 13, 2024

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

dblock commented Mar 15, 2024