Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update performance test with nested field #1488

Merged
merged 1 commit into from
Feb 23, 2024

Conversation

heemin32
Copy link
Collaborator

@heemin32 heemin32 commented Feb 20, 2024

Description

Because the number of items inside nested field is too big(1000, 10001), if it is included in source, the query latency becomes around 1s. To have fair comparison with other test case, excluding nested field from being stored in source.

Configurations

Data Sets

Dataset Train size Dimensions Distance Format K
sift 1,000,000(Not document number but total nested fields of entire document) 128 L2 hdf5 100

Total Queries: 10,000

Cluster Configuration

Config item Value
Leader Nodes 3
Leader Node Type c5.xlarge
Leader Node Disk Space 20
Data Nodes 3
Data Node Type r5.4xlarge
Data Node Disk Space 500
Primary Shard Count 24
Replica Count 1
AZ us-east-1

Client Configuration

Config item Value
OS Amazon Linux
Instance Type c5.9xlarge
Count 1
EBS Size 800G
AZ us-east-1

Faiss (1000 to 10000 items in nested field)

{
  "metadata": {
    "test_name": "Faiss HNSW Nested Field Test",
    "test_id": "Faiss HNSW Nested Field Test",
    "date": "02/21/2024 00:06:13",
    "python_version": "3.9.18 (main, Sep 11 2023, 13:41:44) \n[GCC 11.2.0]",
    "os_version": "Linux-4.14.326-245.539.amzn2.x86_64-x86_64-with-glibc2.26",
    "processor": "x86_64, 36 cores",
    "memory": "564981760 (used) / 72506503168 (available) / 73743400960 (total)"
  },
  "results": {
    "test_took": 295325.73138901166,
    "delete_index_took_total": 204.6960236718102,
    "create_index_took_total": 238.805988667688,
    "ingest_nested_field_took_total": 130149.40594166303,
    "refresh_index_store_kb_total": 3009740.120768229,
    "refresh_index_took_total": 28108.197264674043,
    "force_merge_took_total": 67769.72170666461,
    "warmup_operation_took_total": 256.5711303371548,
    "query_nested_field_took_total": 68598.33333333333,
    "query_nested_field_took_p50": 7.0,
    "query_nested_field_took_p90": 8.0,
    "query_nested_field_took_p99": 9.0,
    "query_nested_field_took_p99.9": 13.333333333333334,
    "query_nested_field_took_p100": 886.6666666666666,
    "query_nested_field_client_time_total": 104420.33333333333,
    "query_nested_field_client_time_p50": 10.0,
    "query_nested_field_client_time_p90": 11.0,
    "query_nested_field_client_time_p99": 13.0,
    "query_nested_field_client_time_p99.9": 54.333333333333336,
    "query_nested_field_client_time_p100": 892.0,
    "query_nested_field_memory_kb_total": 1297446.0,
    "query_nested_field_recall@K_total": 0.9989340000000001,
    "query_nested_field_recall@1_total": 1.0
  }
}

Lucene(1000 to 10000 items in nested field per doc)

{
  "metadata": {
    "test_name": "Lucene HNSW Nested Field Test",
    "test_id": "Lucene HNSW Nested Field Test",
    "date": "02/21/2024 04:55:09",
    "python_version": "3.9.18 (main, Sep 11 2023, 13:41:44) \n[GCC 11.2.0]",
    "os_version": "Linux-4.14.326-245.539.amzn2.x86_64-x86_64-with-glibc2.26",
    "processor": "x86_64, 36 cores",
    "memory": "581652480 (used) / 72480284672 (available) / 73743400960 (total)"
  },
  "results": {
    "test_took": 3167641.372945001,
    "delete_index_took_total": 171.82520866723885,
    "create_index_took_total": 194.31308500255304,
    "ingest_nested_field_took_total": 1016622.4843550008,
    "refresh_index_store_kb_total": 2798702.498046875,
    "refresh_index_took_total": 838.5021863359725,
    "force_merge_took_total": 146532.57332899375,
    "warmup_operation_took_total": 8.00811433388541,
    "query_nested_field_took_total": 2003273.6666666667,
    "query_nested_field_took_p50": 198.66666666666666,
    "query_nested_field_took_p90": 222.0,
    "query_nested_field_took_p99": 240.66666666666666,
    "query_nested_field_took_p99.9": 254.66666666666666,
    "query_nested_field_took_p100": 984.6666666666666,
    "query_nested_field_client_time_total": 2039660.0,
    "query_nested_field_client_time_p50": 202.0,
    "query_nested_field_client_time_p90": 225.66666666666666,
    "query_nested_field_client_time_p99": 245.0,
    "query_nested_field_client_time_p99.9": 271.3333333333333,
    "query_nested_field_client_time_p100": 990.0,
    "query_nested_field_memory_kb_total": 0.0,
    "query_nested_field_recall@K_total": 0.999981,
    "query_nested_field_recall@1_total": 1.0
  }
}

Faiss(10 items in nested field per doc)

{
  "metadata": {
    "test_name": "Faiss HNSW Nested Field Test",
    "test_id": "Faiss HNSW Nested Field Test",
    "date": "02/22/2024 04:45:03",
    "python_version": "3.9.18 (main, Sep 11 2023, 13:41:44) \n[GCC 11.2.0]",
    "os_version": "Linux-4.14.326-245.539.amzn2.x86_64-x86_64-with-glibc2.26",
    "processor": "x86_64, 36 cores",
    "memory": "548995072 (used) / 72500396032 (available) / 73743400960 (total)"
  },
  "results": {
    "test_took": 179865.70236468027,
    "delete_index_took_total": 294.5222413206163,
    "create_index_took_total": 184.65970567194745,
    "ingest_nested_field_took_total": 69229.51351267209,
    "refresh_index_store_kb_total": 3031487.720703125,
    "refresh_index_took_total": 14662.646037681649,
    "force_merge_took_total": 61662.6585573346,
    "warmup_operation_took_total": 351.7023099993821,
    "query_nested_field_took_total": 33480.0,
    "query_nested_field_took_p50": 3.0,
    "query_nested_field_took_p90": 4.0,
    "query_nested_field_took_p99": 4.0,
    "query_nested_field_took_p99.9": 7.0,
    "query_nested_field_took_p100": 671.0,
    "query_nested_field_client_time_total": 63240.666666666664,
    "query_nested_field_client_time_p50": 6.0,
    "query_nested_field_client_time_p90": 7.0,
    "query_nested_field_client_time_p99": 7.0,
    "query_nested_field_client_time_p99.9": 40.333333333333336,
    "query_nested_field_client_time_p100": 677.3333333333334,
    "query_nested_field_memory_kb_total": 1297462.0,
    "query_nested_field_recall@K_total": 0.9992070000000001,
    "query_nested_field_recall@1_total": 1.0
  }
}

Lucene(10 items in nested field per doc)

{
  "metadata": {
    "test_name": "Lucene HNSW Nested Field Test",
    "test_id": "Lucene HNSW Nested Field Test",
    "date": "02/22/2024 04:33:25",
    "python_version": "3.9.18 (main, Sep 11 2023, 13:41:44) \n[GCC 11.2.0]",
    "os_version": "Linux-4.14.326-245.539.amzn2.x86_64-x86_64-with-glibc2.26",
    "processor": "x86_64, 36 cores",
    "memory": "11139792896 (used) / 61909598208 (available) / 73743400960 (total)"
  },
  "results": {
    "test_took": 275221.65689366125,
    "delete_index_took_total": 159.45150900127678,
    "create_index_took_total": 192.05309099440151,
    "ingest_nested_field_took_total": 122905.74228100013,
    "refresh_index_store_kb_total": 1845289.30078125,
    "refresh_index_took_total": 416.56529167084955,
    "force_merge_took_total": 65381.89473200085,
    "warmup_operation_took_total": 34.61665566040514,
    "query_nested_field_took_total": 86131.33333333333,
    "query_nested_field_took_p50": 8.0,
    "query_nested_field_took_p90": 10.0,
    "query_nested_field_took_p99": 11.0,
    "query_nested_field_took_p99.9": 15.666666666666666,
    "query_nested_field_took_p100": 1014.3333333333334,
    "query_nested_field_client_time_total": 386889.0,
    "query_nested_field_client_time_p50": 31.333333333333332,
    "query_nested_field_client_time_p90": 34.333333333333336,
    "query_nested_field_client_time_p99": 37.0,
    "query_nested_field_client_time_p99.9": 1372.0,
    "query_nested_field_client_time_p100": 14920.333333333334,
    "query_nested_field_memory_kb_total": 2.0,
    "query_nested_field_recall@K_total": 0.9982586666666666,
    "query_nested_field_recall@1_total": 1.0
  }
}

Issues Resolved

N/A

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@heemin32 Please add the details around Cluster and client configuration used to run the benchmarks.

@heemin32
Copy link
Collaborator Author

@heemin32 Please add the details around Cluster and client configuration used to run the benchmarks.

Updated

@heemin32 heemin32 force-pushed the perf branch 2 times, most recently from 1cbd327 to e6a8a06 Compare February 23, 2024 02:16
Copy link

codecov bot commented Feb 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.02%. Comparing base (c6ac3db) to head (71c88f0).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1488   +/-   ##
=========================================
  Coverage     85.02%   85.02%           
+ Complexity     1278     1277    -1     
=========================================
  Files           167      167           
  Lines          5207     5207           
  Branches        493      493           
=========================================
  Hits           4427     4427           
+ Misses          573      572    -1     
- Partials        207      208    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Heemin Kim <heemin@amazon.com>
@heemin32
Copy link
Collaborator Author

This PR contains only changes in benchmark/perf-test. Build failure has nothing to do with this change.

@@ -9,7 +9,7 @@ steps:
index_name: target_index
- name: create_index
index_name: target_index
index_spec: release-configs/faiss-hnsw/nested/simple/index.json
index_spec: release-configs/lucene-hnsw/nested/simple/index.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we're changing faiss to lucene here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a bug introduced before. This is test spec for lucene.

@heemin32 heemin32 merged commit fe774f7 into opensearch-project:main Feb 23, 2024
49 of 54 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1488-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fe774f751b5eeeea8a0163dfcf7a905e2eb7db81
# Push it to GitHub
git push --set-upstream origin backport/backport-1488-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1488-to-2.x.

@heemin32 heemin32 deleted the perf branch July 18, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants