diff --git a/_about/index.md b/_about/index.md index 041197eeba..404a5a4d6f 100644 --- a/_about/index.md +++ b/_about/index.md @@ -10,6 +10,51 @@ redirect_from: - /docs/opensearch/ - /opensearch/ - /opensearch/index/ +why_use: + - heading: "Vector database" + description: "Use OpenSearch as a vector database to combine the power of traditional search, analytics, and vector search" + link: "/vector-search/" + - heading: "Fast, scalable full-text search" + description: "Help users find the right information in your application, website, or data lake catalog" + link: "/search-plugins/" + - heading: "Application and infrastructure monitoring" + description: "Use observability logs, metrics, and traces to monitor your applications in real time" + link: "/observing-your-data/" + - heading: "Security and event information management" + description: "Centralize logs to enable real-time security monitoring and forensic analysis" + link: "/security/" +features: + - heading: "Vector search" + description: "Build AI/ML-powered vector search applications" + link: "/vector-search/" + - heading: "Machine learning" + description: "Integrate machine learning models into your workloads" + link: "/ml-commons-plugin/" + - heading: "Customizing your search" + description: "From optimizing performance to improving relevance, customize your search experience" + link: "/search-plugins/" + - heading: "Workflow automation" + description: "Automate complex OpenSearch setup and preprocessing tasks" + link: "/automating-configurations/" + - heading: "Anomaly detection" + description: "Identify atypical data and receive automatic notifications" + link: "/monitoring-plugins/ad/" + - heading: "Building visualizations" + description: "Visualize your data in OpenSearch Dashboards" + link: "/dashboards/" +getting_started: + - heading: "Get started with OpenSearch" + description: "Learn about OpenSearch and start ingesting and searching data" + link: "/getting-started/" + - heading: "Get started with OpenSearch Dashboards" + description: "Learn about OpenSearch Dashboards applications and tools used to visualize data" + link: "/dashboards/quickstart/" + - heading: "Get started with vector search" + description: "Learn about vector search options and build your first vector search application" + link: "/search-plugins/" + - heading: "Get started with OpenSearch security" + description: "Learn about security in OpenSearch" + link: "/getting-started/security/" --- {%- comment -%}The `/docs/opensearch/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%} @@ -22,70 +67,20 @@ This section contains documentation for OpenSearch and OpenSearch Dashboards. ## Getting started -To get started, explore the following documentation: - -- [Getting started guide]({{site.url}}{{site.baseurl}}/getting-started/): - - [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/) - - [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/) - - [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/) - - [Ingest data]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/) - - [Search data]({{site.url}}{{site.baseurl}}/getting-started/search-data/) - - [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/) -- [Install OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) -- [Install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/) -- [FAQ](https://opensearch.org/faq) +{% include cards.html cards=page.getting_started %} ## Why use OpenSearch? - - - - - - - - - - - - - - - - - - - - - -
Fast, scalable full-text searchApplication and infrastructure monitoringSecurity and event information managementOperational health tracking
Fast, scalable full-text searchApplication and infrastructure monitoringSecurity and event information managementOperational health tracking
Help users find the right information within your application, website, or data lake catalog. Easily store and analyze log data, and set automated alerts for performance issues.Centralize logs to enable real-time security monitoring and forensic analysis.Use observability logs, metrics, and traces to monitor your applications in real time.
+{% include cards.html cards=page.why_use documentation_link=true %} ## Key features -OpenSearch provides several features to help index, secure, monitor, and analyze your data: - -- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) -- Identify atypical data and receive automatic notifications. -- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) -- Use SQL or a Piped Processing Language (PPL) to query your data. -- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) -- Automate index operations. -- [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/knn/) -- From traditional lexical search to advanced vector and hybrid search, discover the optimal search method for your use case. -- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- Integrate machine learning models into your workloads. -- [Workflow automation]({{site.url}}{{site.baseurl}}/automating-configurations/index/) -- Automate complex OpenSearch setup and preprocessing tasks. -- [Performance evaluation]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) -- Monitor and optimize your cluster. -- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) -- Run search requests in the background. -- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) -- Replicate your data across multiple OpenSearch clusters. - - -## The secure path forward - -OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. To get started, see [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/). - -## Looking for the Javadoc? +{% include cards.html cards=page.features%} -See [opensearch.org/javadocs/](https://opensearch.org/javadocs/). ## Get involved -[OpenSearch](https://opensearch.org) is supported by Amazon Web Services. All components are available under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.html) on [GitHub](https://github.com/opensearch-project/). +[OpenSearch](https://opensearch.org) is supported by the OpenSearch Software Foundation. All components are available under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.html) on [GitHub](https://github.com/opensearch-project/). The project welcomes GitHub issues, bug fixes, features, plugins, documentation---anything at all. To get involved, see [Contributing](https://opensearch.org/source.html) on the OpenSearch website. --- diff --git a/_config.yml b/_config.yml index 49f2b52e6e..8392ae004e 100644 --- a/_config.yml +++ b/_config.yml @@ -124,6 +124,9 @@ collections: workspace: permalink: /:collection/:path/ output: true + vector-search: + permalink: /:collection/:path/ + output: true opensearch_collection: # Define the collections used in the theme @@ -173,6 +176,9 @@ opensearch_collection: search-plugins: name: Search features nav_fold: true + vector-search: + name: Vector search + nav_fold: true ml-commons-plugin: name: Machine learning nav_fold: true diff --git a/_field-types/supported-field-types/knn-memory-optimized.md b/_field-types/supported-field-types/knn-memory-optimized.md new file mode 100644 index 0000000000..25fadceb1e --- /dev/null +++ b/_field-types/supported-field-types/knn-memory-optimized.md @@ -0,0 +1,926 @@ +--- +layout: default +title: Memory-optimized vectors +parent: k-NN vector +grand_parent: Supported field types +nav_order: 30 +--- + +# Memory-optimized vectors + +Vector search operations can be memory intensive, particularly when dealing with large-scale deployments. OpenSearch provides several strategies for optimizing memory usage while maintaining search performance. You can choose between different workload modes that prioritize either low latency or low cost, apply various compression levels to reduce memory footprint, or use alternative vector representations like byte or binary vectors. These optimization techniques allow you to balance memory consumption, search performance, and cost based on your specific use case requirements. + +## Vector workload modes + +Vector search requires balancing search performance and operational costs. While in-memory search provides the lowest latency, [disk-based search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/) offers a more cost-effective approach by reducing memory usage, though it results in slightly higher search latency. To choose between these approaches, use the `mode` mapping parameter in your `knn_vector` field configuration. This parameter sets appropriate default values for k-NN parameters based on your priority: either low latency or low cost. For additional optimization, you can override these default parameter values in your k-NN field mapping. + +OpenSearch supports the following vector workload modes. + +| Mode | Default engine | Description | +|:---|:---|:---| +| `in_memory` (Default) | `faiss` | Prioritizes low-latency search. This mode uses the `faiss` engine without any quantization applied. It is configured with the default parameter values for vector search in OpenSearch. | +| `on_disk` | `faiss` | Prioritizes low-cost vector search while maintaining strong recall. By default, the `on_disk` mode uses quantization and rescoring to execute a two-phase approach in order to retrieve the top neighbors. The `on_disk` mode supports only `float` vector types. | + +To create a vector index that uses the `on_disk` mode for low-cost search, send the following request: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2", + "mode": "on_disk" + } + } + } +} +``` +{% include copy-curl.html %} + +### Compression levels + +The `compression_level` mapping parameter selects a quantization encoder that reduces vector memory consumption by the given factor. The following table lists the available `compression_level` values. + +| Compression level | Supported engines | +|:------------------|:---------------------------------------------| +| `1x` | `faiss`, `lucene`, and `nmslib` (deprecated) | +| `2x` | `faiss` | +| `4x` | `lucene` | +| `8x` | `faiss` | +| `16x` | `faiss` | +| `32x` | `faiss` | + +For example, if a `compression_level` of `32x` is passed for a `float32` index of 768-dimensional vectors, the per-vector memory is reduced from `4 * 768 = 3072` bytes to `3072 / 32 = 846` bytes. Internally, binary quantization (which maps a `float` to a `bit`) may be used to achieve this compression. + +If you set the `compression_level` parameter, then you cannot specify an `encoder` in the `method` mapping. Compression levels greater than `1x` are only supported for `float` vector types. +{: .note} + +The following table lists the default `compression_level` values for the available workload modes. + +| Mode | Default compression level | +|:------------------|:-------------------------------| +| `in_memory` | `1x` | +| `on_disk` | `32x` | + + +To create a vector field with a `compression_level` of `16x`, specify the `compression_level` parameter in the mappings. This parameter overrides the default compression level for the `on_disk` mode from `32x` to `16x`, producing higher recall and accuracy at the expense of a larger memory footprint: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2", + "mode": "on_disk", + "compression_level": "16x" + } + } + } +} +``` +{% include copy-curl.html %} + +## Rescoring quantized results to full precision + +To improve recall while maintaining the memory savings of quantization, you can use a two-phase search approach. In the first phase, `oversample_factor * k` results are retrieved from an index using quantized vectors and the scores are approximated. In the second phase, the full-precision vectors of those `oversample_factor * k` results are loaded into memory from disk, and scores are recomputed against the full-precision query vector. The results are then reduced to the top k. + +The default rescoring behavior is determined by the `mode` and `compression_level` of the backing k-NN vector field: + +- For `in_memory` mode, no rescoring is applied by default. +- For `on_disk` mode, default rescoring is based on the configured `compression_level`. Each `compression_level` provides a default `oversample_factor`, specified in the following table. + +| Compression level | Default rescore `oversample_factor` | +|:------------------|:----------------------------------| +| `32x` (default) | 3.0 | +| `16x` | 2.0 | +| `8x` | 2.0 | +| `4x` | No default rescoring | +| `2x` | No default rescoring | + +To explicitly apply rescoring, provide the `rescore` parameter in a query on a quantized index and specify the `oversample_factor`: + +```json +GET /my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "target-field": { + "vector": [2, 3, 5, 6], + "k": 2, + "rescore" : { + "oversample_factor": 1.2 + } + } + } + } +} +``` +{% include copy-curl.html %} + +Alternatively, set the `rescore` parameter to `true` to use the default `oversample_factor` of `1.0`: + +```json +GET /my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "target-field": { + "vector": [2, 3, 5, 6], + "k": 2, + "rescore" : true + } + } + } +} +``` +{% include copy-curl.html %} + +The `oversample_factor` is a floating-point number between 1.0 and 100.0, inclusive. The number of results in the first pass is calculated as `oversample_factor * k` and is guaranteed to be between 100 and 10,000, inclusive. If the calculated number of results is smaller than 100, then the number of results is set to 100. If the calculated number of results is greater than 10,000, then the number of results is set to 10,000. + +Rescoring is only supported for the `faiss` engine. + +Rescoring is not needed if quantization is not used because the scores returned are already fully precise. +{: .note} + + +## Byte vectors + +By default, k-NN vectors are `float` vectors, in which each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `faiss` or `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. + +Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine. +{: .note} + +In [k-NN benchmarking tests](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, recall precision was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) used and the data distribution). + +When using `byte` vectors, expect some loss of recall precision compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall. +{: .important} + +When using `byte` vectors with the `faiss` engine, we recommend using [Single Instruction Multiple Data (SIMD) optimization]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#simd-optimization), which helps to significantly reduce search latencies and improve indexing throughput. +{: .important} + +Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`. + +To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index. + +### Example: HNSW + +The following example creates a byte vector index with the `lucene` engine and `hnsw` algorithm: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 3, + "data_type": "byte", + "space_type": "l2", + "method": { + "name": "hnsw", + "engine": "lucene", + "parameters": { + "ef_construction": 100, + "m": 16 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +After creating the index, ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range: + +```json +PUT test-index/_doc/1 +{ + "my_vector": [-126, 28, 127] +} +``` +{% include copy-curl.html %} + +```json +PUT test-index/_doc/2 +{ + "my_vector": [100, -128, 0] +} +``` +{% include copy-curl.html %} + +When querying, be sure to use a `byte` vector: + +```json +GET test-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector": { + "vector": [26, -120, 99], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +### Example: IVF + +The `ivf` method requires a training step that creates a model and trains it to initialize the native library index during segment creation. For more information, see [Building a vector index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-vector-index-from-a-model). + +First, create an index that will contain byte vector training data. Specify the `faiss` engine and `ivf` algorithm and make sure that the `dimension` matches the dimension of the model you want to create: + +```json +PUT train-index +{ + "mappings": { + "properties": { + "train-field": { + "type": "knn_vector", + "dimension": 4, + "data_type": "byte" + } + } + } +} +``` +{% include copy-curl.html %} + +First, ingest training data containing byte vectors into the training index: + +```json +PUT _bulk +{ "index": { "_index": "train-index", "_id": "1" } } +{ "train-field": [127, 100, 0, -120] } +{ "index": { "_index": "train-index", "_id": "2" } } +{ "train-field": [2, -128, -10, 50] } +{ "index": { "_index": "train-index", "_id": "3" } } +{ "train-field": [13, -100, 5, 126] } +{ "index": { "_index": "train-index", "_id": "4" } } +{ "train-field": [5, 100, -6, -125] } +``` +{% include copy-curl.html %} + +Then, create and train the model named `byte-vector-model`. The model will be trained using the training data from the `train-field` in the `train-index`. Specify the `byte` data type: + +```json +POST _plugins/_knn/models/byte-vector-model/_train +{ + "training_index": "train-index", + "training_field": "train-field", + "dimension": 4, + "description": "model with byte data", + "data_type": "byte", + "method": { + "name": "ivf", + "engine": "faiss", + "space_type": "l2", + "parameters": { + "nlist": 1, + "nprobes": 1 + } + } +} +``` +{% include copy-curl.html %} + +To check the model training status, call the Get Model API: + +```json +GET _plugins/_knn/models/byte-vector-model?filter_path=state +``` +{% include copy-curl.html %} + +Once the training is complete, the `state` changes to `created`. + +Next, create an index that will initialize its native library indexes using the trained model: + +```json +PUT test-byte-ivf +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "model_id": "byte-vector-model" + } + } + } +} +``` +{% include copy-curl.html %} + +Ingest the data containing the byte vectors that you want to search into the created index: + +```json +PUT _bulk?refresh=true +{"index": {"_index": "test-byte-ivf", "_id": "1"}} +{"my_vector": [7, 10, 15, -120]} +{"index": {"_index": "test-byte-ivf", "_id": "2"}} +{"my_vector": [10, -100, 120, -108]} +{"index": {"_index": "test-byte-ivf", "_id": "3"}} +{"my_vector": [1, -2, 5, -50]} +{"index": {"_index": "test-byte-ivf", "_id": "4"}} +{"my_vector": [9, -7, 45, -78]} +{"index": {"_index": "test-byte-ivf", "_id": "5"}} +{"my_vector": [80, -70, 127, -128]} +``` +{% include copy-curl.html %} + +Finally, search the data. Be sure to provide a byte vector in the k-NN vector field: + +```json +GET test-byte-ivf/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector": { + "vector": [100, -120, 50, -45], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +### Memory estimation + +In the best-case scenario, byte vectors require 25% of the memory required by 32-bit vectors. + +#### HNSW memory estimation + +The memory required for Hierarchical Navigable Small World (HNSW) is estimated to be `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. + +As an example, assume that you have 1 million vectors with a `dimension` of `256` and an `m` of `16`. The memory requirement can be estimated as follows: + +```r +1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.39 GB +``` + +#### IVF memory estimation + +The memory required for Inverted File Index (IVF) is estimated to be `1.1 * ((dimension * num_vectors) + (4 * nlist * dimension))` bytes/vector, where `nlist` is the number of buckets into which to partition vectors. + +As an example, assume that you have 1 million vectors with a `dimension` of `256` and an `nlist` of `128`. The memory requirement can be estimated as follows: + +```r +1.1 * ((256 * 1,000,000) + (4 * 128 * 256)) ~= 0.27 GB +``` + + +### Quantization techniques + +If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. The Faiss engine supports several quantization techniques, such as scalar quantization (SQ) and product quantization (PQ). The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only. + +#### Scalar quantization for the L2 space type + +The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$). + +```python +# Random dataset (Example to create a random dataset) +dataset = np.random.uniform(-300, 300, (100, 10)) +# Random query set (Example to create a random queryset) +queryset = np.random.uniform(-350, 350, (100, 10)) +# Number of values +B = 256 + +# INDEXING: +# Get min and max +dataset_min = np.min(dataset) +dataset_max = np.max(dataset) +# Shift coordinates to be non-negative +dataset -= dataset_min +# Normalize into [0, 1] +dataset *= 1. / (dataset_max - dataset_min) +# Bucket into 256 values +dataset = np.floor(dataset * (B - 1)) - int(B / 2) + +# QUERYING: +# Clip (if queryset range is out of datset range) +queryset = queryset.clip(dataset_min, dataset_max) +# Shift coordinates to be non-negative +queryset -= dataset_min +# Normalize +queryset *= 1. / (dataset_max - dataset_min) +# Bucket into 256 values +queryset = np.floor(queryset * (B - 1)) - int(B / 2) +``` +{% include copy.html %} + +#### Scalar quantization for the cosine similarity space type + +The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$). + +The following pseudocode is for positive numbers: + +```python +# For Positive Numbers + +# INDEXING and QUERYING: + +# Get Max of train dataset +max = np.max(dataset) +min = 0 +B = 127 + +# Normalize into [0,1] +val = (val - min) / (max - min) +val = (val * B) + +# Get int and fraction values +int_part = floor(val) +frac_part = val - int_part + +if 0.5 < frac_part: + bval = int_part + 1 +else: + bval = int_part + +return Byte(bval) +``` +{% include copy.html %} + +The following pseudocode is for negative numbers: + +```python +# For Negative Numbers + +# INDEXING and QUERYING: + +# Get Min of train dataset +min = 0 +max = -np.min(dataset) +B = 128 + +# Normalize into [0,1] +val = (val - min) / (max - min) +val = (val * B) + +# Get int and fraction values +int_part = floor(var) +frac_part = val - int_part + +if 0.5 < frac_part: + bval = int_part + 1 +else: + bval = int_part + +return Byte(bval) +``` +{% include copy.html %} + +## Binary vectors + +You can reduce memory costs by a factor of 32 by switching from float to binary vectors. Using binary vector indexes can lower operational costs while maintaining high recall performance, making large-scale deployment more economical and efficient. + +Binary format is available for the following k-NN search types: + +- [Approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/): Supports binary vectors only for the Faiss engine with the HNSW and IVF algorithms. +- [Script score k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/): Enables the use of binary vectors in script scoring. +- [Painless extensions]({{site.url}}{{site.baseurl}}/search-plugins/knn/painless-functions/): Allows the use of binary vectors with Painless scripting extensions. + +### Requirements + +There are several requirements for using binary vectors in the OpenSearch k-NN plugin: + +- The `data_type` of the binary vector index must be `binary`. +- The `space_type` of the binary vector index must be `hamming`. +- The `dimension` of the binary vector index must be a multiple of 8. +- You must convert your binary data into 8-bit signed integers (`int8`) in the [-128, 127] range. For example, the binary sequence of 8 bits `0, 1, 1, 0, 0, 0, 1, 1` must be converted into its equivalent byte value of `99` in order to be used as a binary vector input. + +### Example: HNSW + +To create a binary vector index with the Faiss engine and HNSW algorithm, send the following request: + +```json +PUT /test-binary-hnsw +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 8, + "data_type": "binary", + "space_type": "hamming", + "method": { + "name": "hnsw", + "engine": "faiss" + } + } + } + } +} +``` +{% include copy-curl.html %} + +Then ingest some documents containing binary vectors: + +```json +PUT _bulk +{"index": {"_index": "test-binary-hnsw", "_id": "1"}} +{"my_vector": [7], "price": 4.4} +{"index": {"_index": "test-binary-hnsw", "_id": "2"}} +{"my_vector": [10], "price": 14.2} +{"index": {"_index": "test-binary-hnsw", "_id": "3"}} +{"my_vector": [15], "price": 19.1} +{"index": {"_index": "test-binary-hnsw", "_id": "4"}} +{"my_vector": [99], "price": 1.2} +{"index": {"_index": "test-binary-hnsw", "_id": "5"}} +{"my_vector": [80], "price": 16.5} +``` +{% include copy-curl.html %} + +When querying, be sure to use a binary vector: + +```json +GET /test-binary-hnsw/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector": { + "vector": [9], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the two vectors closest to the query vector: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 8, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "test-binary-hnsw", + "_id": "2", + "_score": 0.5, + "_source": { + "my_vector": [ + 10 + ], + "price": 14.2 + } + }, + { + "_index": "test-binary-hnsw", + "_id": "5", + "_score": 0.25, + "_source": { + "my_vector": [ + 80 + ], + "price": 16.5 + } + } + ] + } +} +``` +
+ +### Example: IVF + +The IVF method requires a training step that creates a model and trains it to initialize the native library index during segment creation. For more information, see [Building a vector index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-vector-index-from-a-model). + +First, create an index that will contain binary vector training data. Specify the Faiss engine and IVF algorithm and make sure that the `dimension` matches the dimension of the model you want to create: + +```json +PUT train-index +{ + "mappings": { + "properties": { + "train-field": { + "type": "knn_vector", + "dimension": 8, + "data_type": "binary" + } + } + } +} +``` +{% include copy-curl.html %} + +Ingest training data containing binary vectors into the training index: + +
+ + Bulk ingest request + + {: .text-delta} + +```json +PUT _bulk +{ "index": { "_index": "train-index", "_id": "1" } } +{ "train-field": [1] } +{ "index": { "_index": "train-index", "_id": "2" } } +{ "train-field": [2] } +{ "index": { "_index": "train-index", "_id": "3" } } +{ "train-field": [3] } +{ "index": { "_index": "train-index", "_id": "4" } } +{ "train-field": [4] } +{ "index": { "_index": "train-index", "_id": "5" } } +{ "train-field": [5] } +{ "index": { "_index": "train-index", "_id": "6" } } +{ "train-field": [6] } +{ "index": { "_index": "train-index", "_id": "7" } } +{ "train-field": [7] } +{ "index": { "_index": "train-index", "_id": "8" } } +{ "train-field": [8] } +{ "index": { "_index": "train-index", "_id": "9" } } +{ "train-field": [9] } +{ "index": { "_index": "train-index", "_id": "10" } } +{ "train-field": [10] } +{ "index": { "_index": "train-index", "_id": "11" } } +{ "train-field": [11] } +{ "index": { "_index": "train-index", "_id": "12" } } +{ "train-field": [12] } +{ "index": { "_index": "train-index", "_id": "13" } } +{ "train-field": [13] } +{ "index": { "_index": "train-index", "_id": "14" } } +{ "train-field": [14] } +{ "index": { "_index": "train-index", "_id": "15" } } +{ "train-field": [15] } +{ "index": { "_index": "train-index", "_id": "16" } } +{ "train-field": [16] } +{ "index": { "_index": "train-index", "_id": "17" } } +{ "train-field": [17] } +{ "index": { "_index": "train-index", "_id": "18" } } +{ "train-field": [18] } +{ "index": { "_index": "train-index", "_id": "19" } } +{ "train-field": [19] } +{ "index": { "_index": "train-index", "_id": "20" } } +{ "train-field": [20] } +{ "index": { "_index": "train-index", "_id": "21" } } +{ "train-field": [21] } +{ "index": { "_index": "train-index", "_id": "22" } } +{ "train-field": [22] } +{ "index": { "_index": "train-index", "_id": "23" } } +{ "train-field": [23] } +{ "index": { "_index": "train-index", "_id": "24" } } +{ "train-field": [24] } +{ "index": { "_index": "train-index", "_id": "25" } } +{ "train-field": [25] } +{ "index": { "_index": "train-index", "_id": "26" } } +{ "train-field": [26] } +{ "index": { "_index": "train-index", "_id": "27" } } +{ "train-field": [27] } +{ "index": { "_index": "train-index", "_id": "28" } } +{ "train-field": [28] } +{ "index": { "_index": "train-index", "_id": "29" } } +{ "train-field": [29] } +{ "index": { "_index": "train-index", "_id": "30" } } +{ "train-field": [30] } +{ "index": { "_index": "train-index", "_id": "31" } } +{ "train-field": [31] } +{ "index": { "_index": "train-index", "_id": "32" } } +{ "train-field": [32] } +{ "index": { "_index": "train-index", "_id": "33" } } +{ "train-field": [33] } +{ "index": { "_index": "train-index", "_id": "34" } } +{ "train-field": [34] } +{ "index": { "_index": "train-index", "_id": "35" } } +{ "train-field": [35] } +{ "index": { "_index": "train-index", "_id": "36" } } +{ "train-field": [36] } +{ "index": { "_index": "train-index", "_id": "37" } } +{ "train-field": [37] } +{ "index": { "_index": "train-index", "_id": "38" } } +{ "train-field": [38] } +{ "index": { "_index": "train-index", "_id": "39" } } +{ "train-field": [39] } +{ "index": { "_index": "train-index", "_id": "40" } } +{ "train-field": [40] } +``` +{% include copy-curl.html %} +
+ +Then, create and train the model named `test-binary-model`. The model will be trained using the training data from the `train_field` in the `train-index`. Specify the `binary` data type and `hamming` space type: + +```json +POST _plugins/_knn/models/test-binary-model/_train +{ + "training_index": "train-index", + "training_field": "train-field", + "dimension": 8, + "description": "model with binary data", + "data_type": "binary", + "space_type": "hamming", + "method": { + "name": "ivf", + "engine": "faiss", + "parameters": { + "nlist": 16, + "nprobes": 1 + } + } +} +``` +{% include copy-curl.html %} + +To check the model training status, call the Get Model API: + +```json +GET _plugins/_knn/models/test-binary-model?filter_path=state +``` +{% include copy-curl.html %} + +Once the training is complete, the `state` changes to `created`. + +Next, create an index that will initialize its native library indexes using the trained model: + +```json +PUT test-binary-ivf +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "model_id": "test-binary-model" + } + } + } +} +``` +{% include copy-curl.html %} + +Ingest the data containing the binary vectors that you want to search into the created index: + +```json +PUT _bulk?refresh=true +{"index": {"_index": "test-binary-ivf", "_id": "1"}} +{"my_vector": [7], "price": 4.4} +{"index": {"_index": "test-binary-ivf", "_id": "2"}} +{"my_vector": [10], "price": 14.2} +{"index": {"_index": "test-binary-ivf", "_id": "3"}} +{"my_vector": [15], "price": 19.1} +{"index": {"_index": "test-binary-ivf", "_id": "4"}} +{"my_vector": [99], "price": 1.2} +{"index": {"_index": "test-binary-ivf", "_id": "5"}} +{"my_vector": [80], "price": 16.5} +``` +{% include copy-curl.html %} + +Finally, search the data. Be sure to provide a binary vector in the k-NN vector field: + +```json +GET test-binary-ivf/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector": { + "vector": [8], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the two vectors closest to the query vector: + +
+ + Response + + {: .text-delta} + +```json +GET /_plugins/_knn/models/my-model?filter_path=state +{ + "took": 7, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "test-binary-ivf", + "_id": "2", + "_score": 0.5, + "_source": { + "my_vector": [ + 10 + ], + "price": 14.2 + } + }, + { + "_index": "test-binary-ivf", + "_id": "3", + "_score": 0.25, + "_source": { + "my_vector": [ + 15 + ], + "price": 19.1 + } + } + ] + } +} +``` +
+ +### Memory estimation + +Use the following formulas to estimate the amount of memory required for binary vectors. + +#### HNSW memory estimation + +The memory required for HNSW can be estimated using the following formula, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph: + +```r +1.1 * (dimension / 8 + 8 * m) bytes/vector +``` + +#### IVF memory estimation + +The memory required for IVF can be estimated using the following formula, where `nlist` is the number of buckets into which to partition vectors: + +```r +1.1 * (((dimension / 8) * num_vectors) + (nlist * dimension / 8)) +``` + +## Next steps + +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) +- [Disk-based vector search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/) +- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/) \ No newline at end of file diff --git a/_field-types/supported-field-types/knn-methods-engines.md b/_field-types/supported-field-types/knn-methods-engines.md new file mode 100644 index 0000000000..4ac160533d --- /dev/null +++ b/_field-types/supported-field-types/knn-methods-engines.md @@ -0,0 +1,422 @@ +--- +layout: default +title: Methods and engines +parent: k-NN vector +grand_parent: Supported field types +nav_order: 20 +--- + +# Methods and engines + +A _method_ defines the algorithm used for organizing vector data at indexing time and searching it at search time in [approximate k-NN search]({{site.url}}{{site.baseurl}}/vector-search/vector-search-techniques/approximate-knn/). + +OpenSearch supports the following methods: + +- **Hierarchical Navigable Small World (HNSW)** creates a hierarchical graph structure of connections between vectors. For more information about the algorithm, see [Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs](https://arxiv.org/abs/1603.09320). +- **Inverted File Index (IVF)** organizes vectors into buckets based on clustering and, during search, searches only a subset of the buckets. + +An _engine_ is the library that implements these methods. Different engines can implement the same method, sometimes with varying optimizations or characteristics. For example, HNSW is implemented by all supported engines, each with its own advantages. + +OpenSearch supports the following engines: +- [**Lucene**](#lucene-engine): The native search library, offering an HNSW implementation with efficient filtering capabilities +- [**Faiss**](#faiss-engine) (Facebook AI Similarity Search): A comprehensive library implementing both the HNSW and IVF methods, with additional vector compression options +- [**NMSLIB**](#nmslib-engine-deprecated) (Non-Metric Space Library): A legacy implementation of HNSW (now deprecated) + +## Method definition example + +A method definition contains the following components: + +- The `name` of the method (for example, `hnsw` or `ivf`) +- The `space_type` for which the method is built (for example, `l2` or `cosinesimil`) +- The `engine` that will implement the method (for example, `faiss` or `lucene`) +- A map of `parameters` specific to that implementation + +The following example configures an `hnsw` method with the `l2` space type, the `faiss` engine, and the method-specific parameters: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 1024, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "faiss", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +Not every method/engine combination supports each of the spaces. For a list of supported spaces, see the section for a specific engine. +{: .note} + +## Common parameters + +The following parameters are common to all method definitions. + +Mapping parameter | Required | Default | Updatable | Description +:--- | :--- | :--- | :--- | :--- +`name` | Yes | N/A | No | The nearest neighbor method. Valid values are `hnsw` and `ivf`. Not every engine combination supports each of the methods. For a list of supported methods, see the section for a specific engine. +`space_type` | No | `l2` | No | The vector space used to calculate the distance between vectors. Valid values are `l1`, `l2`, `linf`, `cosinesimil`, `innerproduct`, `hamming`, and `hammingbit`. Not every method/engine combination supports each of the spaces. For a list of supported spaces, see the section for a specific engine. Note: This value can also be specified at the top level of the mapping. For more information, see [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/). +`engine` | No | `faiss` | No | The approximate k-NN library to use for indexing and search. Valid values are `faiss`, `lucene`, and `nmslib` (deprecated). +`parameters` | No | `null` | No | The parameters used for the nearest neighbor method. For more information, see the section for a specific engine. + +## Lucene engine + +The Lucene engine provides a native implementation of vector search directly within Lucene. It offers efficient filtering capabilities and is well suited for smaller deployments. + +### Supported methods + +The Lucene engine supports the following method. + +Method name | Requires training | Supported spaces +:--- | :--- |:--- +[`hnsw`](#hnsw-parameters) | No | `l2`, `cosinesimil`, `innerproduct` (supported in OpenSearch 2.13 and later) + +#### HNSW parameters + +The HNSW method supports the following parameters. + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :--- | :--- | :--- +`ef_construction` | No | 100 | No | The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed.
Note: Lucene uses the term `beam_width` internally, but the OpenSearch documentation uses `ef_construction` for consistency. +`m` | No | 16 | No | The number of bidirectional links created for each new element. Impacts memory consumption significantly. Keep between `2` and `100`.
Note: Lucene uses the term `max_connections` internally, but the OpenSearch documentation uses `m` for consistency. + +The Lucene HNSW implementation ignores `ef_search` and dynamically sets it to the value of "k" in the search request. There is therefore no need to configure settings for `ef_search` when using the Lucene engine. +{: .note} + +An index created in OpenSearch version 2.11 or earlier will still use the previous `ef_construction` value (`512`). +{: .note} + +### Example configuration + +```json +"method": { + "name": "hnsw", + "engine": "lucene", + "parameters": { + "m": 2048, + "ef_construction": 245 + } +} +``` + +## Faiss engine + +The Faiss engine provides advanced vector indexing capabilities with support for multiple methods and encoding options to optimize memory usage and search performance. + +### Supported methods + +The Faiss engine supports the following methods. + +Method name | Requires training | Supported spaces +:--- | :--- |:--- +[`hnsw`](#hnsw-parameters-1) | No | `l2`, `innerproduct` (not available when [PQ](#pq-parameters) is used), `hamming` +[`ivf`](#ivf-parameters) | Yes | `l2`, `innerproduct`, `hamming` (supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized#binary-vectors). + + +#### HNSW parameters + +The `hnsw` method supports the following parameters. + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :--- | :--- | :--- +`ef_search` | No | 100 | No | The size of the dynamic list used during k-NN searches. Higher values result in more accurate but slower searches. +`ef_construction` | No | 100 | No | The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed. +`m` | No | 16 | No | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between `2` and `100`. +`encoder` | No | flat | No | An encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index at the expense of search accuracy. + +An index created in OpenSearch version 2.11 or earlier will still use the previous `ef_construction` value (`512`). +{: .note} + +#### IVF parameters + +The IVF method supports the following parameters. + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :--- | :--- | :--- +`nlist` | No | 4 | No | The number of buckets into which to partition vectors. Higher values may increase accuracy but also increase memory and training latency. +`nprobes` | No | 1 | No | The number of buckets to search during a query. Higher values result in more accurate but slower searches. +`encoder` | No | flat | No | An encoder definition for encoding vectors. + +For more information about these parameters, see the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes). + +### IVF training requirements + +The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model), passing the IVF method definition. IVF requires, at a minimum, that there be `nlist` training data points, but we recommend [that you use more than this](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). Training data can be the same as the data you plan to index or come from a separate dataset. + +### Supported encoders + +You can use encoders to reduce the memory footprint of a vector index at the expense of search accuracy. + +OpenSearch currently supports the following encoders in the Faiss library. + +Encoder name | Requires training | Description +:--- | :--- | :--- +`flat` (Default) | No | Encode vectors as floating-point arrays. This encoding does not reduce memory footprint. +[`pq`](#pq-parameters) | Yes | An abbreviation for _product quantization_, PQ is a lossy compression technique that uses clustering to encode a vector into a fixed byte size, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are separated into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). +[`sq`](#sq-parameters) | No | An abbreviation for _scalar quantization_. Starting with OpenSearch version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/faiss-16-bit-quantization/). + +#### PQ parameters + +The `pq` encoder supports the following parameters. + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :--- | :--- | :--- +`m` | No | `1` | No | Determines the number of subvectors into which to separate the vector. Subvectors are encoded independently of each other. This vector dimension must be divisible by `m`. Maximum value is 1,024. +`code_size` | No | `8` | No | Determines the number of bits into which to encode a subvector. Maximum value is `8`. For `ivf`, this value must be less than or equal to `8`. For `hnsw`, this value must be `8`. + +The `hnsw` method supports the `pq` encoder for OpenSearch version 2.10 and later. The `code_size` parameter of a `pq` encoder with the `hnsw` method must be **8**. +{: .important} + +#### SQ parameters + +The `sq` encoder supports the following parameters. + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :-- | :--- | :--- +`type` | No | `fp16` | No | The type of scalar quantization to be used to encode 32-bit float vectors into the corresponding type. As of OpenSearch 2.13, only the `fp16` encoder type is supported. For the `fp16` encoder, vector values must be in the [-65504.0, 65504.0] range. +`clip` | No | `false` | No | If `true`, then any vector values outside of the supported range for the specified vector type are rounded so that they are within the range. If `false`, then the request is rejected if any vector values are outside of the supported range. Setting `clip` to `true` may decrease recall. + +For more information and examples, see [Using Faiss scalar quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/faiss-16-bit-quantization/). + +### SIMD optimization + +Starting with version 2.13, OpenSearch supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost overall performance by improving indexing throughput and reducing search latency. Starting with version 2.18, OpenSearch supports AVX-512 SIMD instructions on x64 architecture. Starting with version 2.19, OpenSearch supports advanced AVX-512 SIMD instructions on x64 architecture for Intel Sapphire Rapids or a newer-generation processor, improving the performance of Hamming distance computation. + +SIMD optimization is applicable only if the vector dimension is a multiple of 8. +{: .note} + + +#### x64 architecture + + +For x64 architecture, the following versions of the Faiss library are built and shipped with the artifact: + +- `libopensearchknn_faiss_avx512_spr.so`: The Faiss library containing advanced AVX-512 SIMD instructions for newer-generation processors, available on public clouds such as AWS for c/m/r 7i or newer instances. +- `libopensearchknn_faiss_avx512.so`: The Faiss library containing AVX-512 SIMD instructions. +- `libopensearchknn_faiss_avx2.so`: The Faiss library containing AVX2 SIMD instructions. +- `libopensearchknn_faiss.so`: The non-optimized Faiss library without SIMD instructions. + +When using the Faiss library, the performance ranking is as follows: advanced AVX-512 > AVX-512 > AVX2 > no optimization. +{: .note } + +If your hardware supports advanced AVX-512(spr), OpenSearch loads the `libopensearchknn_faiss_avx512_spr.so` library at runtime. + +If your hardware supports AVX-512, OpenSearch loads the `libopensearchknn_faiss_avx512.so` library at runtime. + +If your hardware supports AVX2 but doesn't support AVX-512, OpenSearch loads the `libopensearchknn_faiss_avx2.so` library at runtime. + +To disable the advanced AVX-512 (for Sapphire Rapids or newer-generation processors), AVX-512, and AVX2 SIMD instructions and load the non-optimized Faiss library (`libopensearchknn_faiss.so`), specify the `knn.faiss.avx512_spr.disabled`, `knn.faiss.avx512.disabled`, and `knn.faiss.avx2.disabled` static settings as `true` in `opensearch.yml` (by default, all of these are set to `false`). + +Note that to update a static setting, you must stop the cluster, change the setting, and restart the cluster. For more information, see [Static settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). + +#### ARM64 architecture + +For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library contains Neon SIMD instructions and cannot be disabled. + +### Example configurations + +The following example uses the `ivf` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): + +```json +"method": { + "name":"ivf", + "engine":"faiss", + "parameters":{ + "nlist": 4, + "nprobes": 2 + } +} +``` + +The following example uses the `ivf` method with a `pq` encoder: + +```json +"method": { + "name":"ivf", + "engine":"faiss", + "parameters":{ + "encoder":{ + "name":"pq", + "parameters":{ + "code_size": 8, + "m": 8 + } + } + } +} +``` + +The following example uses the `hnsw` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): + +```json +"method": { + "name":"hnsw", + "engine":"faiss", + "parameters":{ + "ef_construction": 256, + "m": 8 + } +} +``` + +The following example uses the `ivf` method with an `sq` encoder of type `fp16`: + +```json +"method": { + "name":"ivf", + "engine":"faiss", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": false + } + }, + "nprobes": 2 + } +} +``` + +The following example uses the `hnsw` method with an `sq` encoder of type `fp16` with `clip` enabled: + +```json +"method": { + "name":"hnsw", + "engine":"faiss", + "parameters":{ + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16", + "clip": true + } + }, + "ef_construction": 256, + "m": 8 + } +} +``` + +## NMSLIB engine (deprecated) + +The Non-Metric Space Library (NMSLIB) engine was one of the first vector search implementations in OpenSearch. While still supported, it has been deprecated in favor of the Faiss and Lucene engines. + +### Supported methods + +The NMSLIB engine supports the following method. + +Method name | Requires training | Supported spaces +:--- | :--- | :--- +[`hnsw`](#hnsw-parameters-2) | No | `l2`, `innerproduct`, `cosinesimil`, `l1`, `linf` + +#### HNSW parameters + +The HNSW method supports the following parameters. + +Parameter name | Required | Default | Updatable | Description +:--- | :--- | :--- | :--- | :--- +`ef_construction` | No | 100 | No | The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed. +`m` | No | 16 | No | The number of bidirectional links created for each new element. Impacts memory consumption significantly. Keep between `2` and `100`. + +For NMSLIB (deprecated), *ef_search* is set in the [index settings]({{site.url}}{{site.baseurl}}/vector-search/settings/#index-settings). +{: .note} + +An index created in OpenSearch version 2.11 or earlier will still use the previous `ef_construction` value (`512`). +{: .note} + +### Example configuration + +```json +"method": { + "name": "hnsw", + "engine": "nmslib", + "space_type": "l2", + "parameters": { + "ef_construction": 100, + "m": 16 + } +} +``` + +## Choosing the right method + +There are several options to choose from when building your `knn_vector` field. To select the correct method and parameters, you should first understand the requirements of your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, and (4) indexing latency. + +If memory is not a concern, HNSW offers a strong query latency/query quality trade-off. + +If you want to use less memory and increase indexing speed as compared to HNSW while maintaining similar query quality, you should evaluate IVF. + +If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. + +You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/faiss-16-bit-quantization/). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#byte-vectors) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/). + +## Engine recommendations + +In general, select Faiss for large-scale use cases. Lucene is a good option for smaller deployments and offers benefits like smart filtering, where the optimal filtering strategy—pre-filtering, post-filtering, or exact k-NN—is automatically applied depending on the situation. The following table summarizes the differences between each option. + +| | Faiss/HNSW | Faiss/IVF | Lucene/HNSW | +|:---|:---|:---|:---| +| Max dimensions | 16,000 | 16,000 | 16,000 | +| Filter | Post-filter | Post-filter | Filter during search | +| Training required | No (Yes for PQ) | Yes | No | +| Similarity metrics | `l2`, `innerproduct` | `l2`, `innerproduct` | `l2`, `cosinesimil` | +| Number of vectors | Tens of billions | Tens of billions | Less than 10 million | +| Indexing latency | Low | Lowest | Low | +| Query latency and quality | Low latency and high quality | Low latency and low quality | High latency and high quality | +| Vector compression | Flat

PQ | Flat

PQ | Flat | +| Memory consumption | High

Low with PQ | Medium

Low with PQ | High | + +## Memory estimation + +In a typical OpenSearch cluster, a certain portion of RAM is reserved for the JVM heap. OpenSearch allocates native library indexes to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set to 50%. + +Using a replica doubles the total number of vectors. +{: .note } + +For information about using memory estimation with vector quantization, see [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/). +{: .note } + +### HNSW memory estimation + +The memory required for HNSW is estimated to be `1.1 * (4 * dimension + 8 * m)` bytes/vector. + +As an example, assume you have 1 million vectors with a `dimension` of 256 and an `m` of 16. The memory requirement can be estimated as follows: + +```r +1.1 * (4 * 256 + 8 * 16) * 1,000,000 ~= 1.267 GB +``` + +### IVF memory estimation + +The memory required for IVF is estimated to be `1.1 * (((4 * dimension) * num_vectors) + (4 * nlist * d))` bytes. + +As an example, assume you have 1 million vectors with a `dimension` of `256` and an `nlist` of `128`. The memory requirement can be estimated as follows: + +```r +1.1 * (((4 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 1.126 GB +``` + +## Next steps + +- [Performance tuning]({{site.url}}{{site.baseurl}}/vector-search/performance-tuning/) +- [Optimizing vector storage]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/) +- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/) \ No newline at end of file diff --git a/_field-types/supported-field-types/knn-spaces.md b/_field-types/supported-field-types/knn-spaces.md new file mode 100644 index 0000000000..7b0ce09aab --- /dev/null +++ b/_field-types/supported-field-types/knn-spaces.md @@ -0,0 +1,98 @@ +--- +layout: default +title: Spaces +parent: k-NN vector +grand_parent: Supported field types +nav_order: 10 +has_math: true +--- + +# Spaces + +In vector search, a _space_ defines how the distance (or similarity) between two vectors is calculated. The choice of space affects how nearest neighbors are determined during search operations. + +## Distance calculation + +A space defines the function used to measure the distance between two points in order to determine the k-nearest neighbors. In k-NN search, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a higher score equates to a better result. OpenSearch supports the following spaces. + +Not every method/engine combination supports each of the spaces. For a list of supported spaces, see the section for a specific engine in the [method documentation]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/). +{: .note} + +| Space type | Search type | Distance function ($$d$$ ) | OpenSearch score | +| :--- | :--- | :--- | +| `l1` | Approximate, exact | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \lvert x_i - y_i \rvert $$ | $$ score = {1 \over {1 + d} } $$ | +| `l2` | Approximate, exact | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 $$ | $$ score = {1 \over 1 + d } $$ | +| `linf` | Approximate, exact | $$ d(\mathbf{x}, \mathbf{y}) = max(\lvert x_i - y_i \rvert) $$ | $$ score = {1 \over 1 + d } $$ | +| `cosinesimil` | Approximate, exact | $$ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} \cdot \mathbf{y} \over \lVert \mathbf{x}\rVert \cdot \lVert \mathbf{y}\rVert}$$$$ = 1 - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} \cdot \sqrt{\sum_{i=1}^n y_i^2}}$$,
where $$\lVert \mathbf{x}\rVert$$ and $$\lVert \mathbf{y}\rVert$$ represent the norms of vectors $$\mathbf{x}$$ and $$\mathbf{y}$$, respectively. | $$ score = {2 - d \over 2} $$ | +| `innerproduct` (supported for Lucene in OpenSearch version 2.13 and later) | Approximate | **NMSLIB** and **Faiss**:
$$ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i $$

**Lucene**:
$$ d(\mathbf{x}, \mathbf{y}) = {\mathbf{x} \cdot \mathbf{y}} = \sum_{i=1}^n x_i y_i $$ | **NMSLIB** and **Faiss**:
$$ \text{If} d \ge 0, score = {1 \over 1 + d }$$
$$\text{If} d < 0, score = −d + 1$$

**Lucene:**
$$ \text{If} d > 0, score = d + 1 $$
$$\text{If} d \le 0, score = {1 \over 1 + (-1 \cdot d) }$$ | +| `innerproduct` (supported for Lucene in OpenSearch version 2.13 and later) | Exact | $$ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i $$ | $$ \text{If} d \ge 0, score = {1 \over 1 + d }$$
$$\text{If} d < 0, score = −d + 1$$ | +| `hamming` (supported for binary vectors in OpenSearch version 2.16 and later) | Approximate, exact | $$ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})$$ | $$ score = {1 \over 1 + d } $$ | +| `hammingbit` (supported for binary and long vectors) | Exact | $$ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})$$ | $$ score = {1 \over 1 + d } $$ | + +The cosine similarity formula does not include the `1 -` prefix. However, because similarity search libraries equate lower scores with closer results, they return `1 - cosineSimilarity` for the cosine similarity space---this is why `1 -` is included in the distance function. +{: .note } + +With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. +{: .note } + +The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized#binary-vectors). +{: .note} + +## Specifying the space type + +The space type is specified when creating an index. + +You can specify the space type at the top level of the field mapping: + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2" + } + } + } +} +``` +{% include copy-curl.html %} + +Alternatively, you can specify the space type within the `method` object if defining a method: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 1024, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "nmslib", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md index 7ec53582bc..6e10cbb601 100644 --- a/_field-types/supported-field-types/knn-vector.md +++ b/_field-types/supported-field-types/knn-vector.md @@ -1,62 +1,24 @@ --- layout: default title: k-NN vector -nav_order: 58 -has_children: false +nav_order: 20 +has_children: true parent: Supported field types has_math: true --- -# k-NN vector field type +# k-NN vector **Introduced 1.0** {: .label .label-purple } -The [k-NN plugin]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/) introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id. +The `knn_vector` data type allows you to ingest vectors into an OpenSearch index and perform different kinds of vector search. The `knn_vector` field is highly configurable and can serve many different vector workloads. In general, a `knn_vector` field can be built either by [providing a method definition](#method-definitions) or [specifying a model ID](#model-ids). ## Example -For example, to map `my_vector` as a `knn_vector`, use the following request: +To map `my_vector` as a `knn_vector`, use the following request: ```json -PUT test-index -{ - "settings": { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector": { - "type": "knn_vector", - "dimension": 3, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "faiss" - } - } - } - } -} -``` -{% include copy-curl.html %} - -## Vector workload modes - -Vector search involves trade-offs between low-latency and low-cost search. Specify the `mode` mapping parameter of the `knn_vector` type to indicate which search mode you want to prioritize. The `mode` dictates the default values for k-NN parameters. You can further fine-tune your index by overriding the default parameter values in the k-NN field mapping. - -The following modes are currently supported. - -| Mode | Default engine | Description | -|:---|:---|:---| -| `in_memory` (Default) | `faiss` | Prioritizes low-latency search. This mode uses the `faiss` engine without any quantization applied. It is configured with the default parameter values for vector search in OpenSearch. | -| `on_disk` | `faiss` | Prioritizes low-cost vector search while maintaining strong recall. By default, the `on_disk` mode uses quantization and rescoring to execute a two-pass approach to retrieve the top neighbors. The `on_disk` mode supports only `float` vector types. | - -To create a k-NN index that uses the `on_disk` mode for low-cost search, send the following request: - -```json -PUT test-index +PUT /test-index { "settings": { "index": { @@ -68,8 +30,7 @@ PUT test-index "my_vector": { "type": "knn_vector", "dimension": 3, - "space_type": "l2", - "mode": "on_disk" + "space_type": "l2" } } } @@ -77,33 +38,10 @@ PUT test-index ``` {% include copy-curl.html %} -## Compression levels - -The `compression_level` mapping parameter selects a quantization encoder that reduces vector memory consumption by the given factor. The following table lists the available `compression_level` values. +## Optimizing vector storage -| Compression level | Supported engines | -|:------------------|:---------------------------------------------| -| `1x` | `faiss`, `lucene`, and `nmslib` (deprecated) | -| `2x` | `faiss` | -| `4x` | `lucene` | -| `8x` | `faiss` | -| `16x` | `faiss` | -| `32x` | `faiss` | +To optimize vector storage, you can specify a [vector workload mode]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#vector-workload-modes) as `in_memory` (which optimizes for lowest latency) or `on_disk` (which optimizes for lowest cost). The `on_disk` mode reduces memory usage. Optionally, you can specify a [`compression_level`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#compression-levels) to fine-tune the vector memory consumption: -For example, if a `compression_level` of `32x` is passed for a `float32` index of 768-dimensional vectors, the per-vector memory is reduced from `4 * 768 = 3072` bytes to `3072 / 32 = 846` bytes. Internally, binary quantization (which maps a `float` to a `bit`) may be used to achieve this compression. - -If you set the `compression_level` parameter, then you cannot specify an `encoder` in the `method` mapping. Compression levels greater than `1x` are only supported for `float` vector types. -{: .note} - -The following table lists the default `compression_level` values for the available workload modes. - -| Mode | Default compression level | -|:------------------|:-------------------------------| -| `in_memory` | `1x` | -| `on_disk` | `32x` | - - -To create a vector field with a `compression_level` of `16x`, specify the `compression_level` parameter in the mappings. This parameter overrides the default compression level for the `on_disk` mode from `32x` to `16x`, producing higher recall and accuracy at the expense of a larger memory footprint: ```json PUT test-index @@ -128,68 +66,10 @@ PUT test-index ``` {% include copy-curl.html %} -## Method definitions - -[Method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) are used when the underlying [approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files. -```json -"my_vector": { - "type": "knn_vector", - "dimension": 4, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "nmslib", - "parameters": { - "ef_construction": 100, - "m": 16 - } - } -} -``` - -## Model IDs - -Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the model must be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model). The -model contains the information needed to initialize the native library segment files. - -```json -"my_vector": { - "type": "knn_vector", - "model_id": "my-model" -} -``` - -However, if you intend to use Painless scripting or a k-NN score script, you only need to pass the dimension. - ```json -"my_vector": { - "type": "knn_vector", - "dimension": 128 - } - ``` - -## Byte vectors - -By default, k-NN vectors are `float` vectors, in which each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `faiss` or `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. - -Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine. -{: .note} - -In [k-NN benchmarking tests](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) and data distribution). - -When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall. -{: .important} - -When using `byte` vectors with the `faiss` engine, we recommend using [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), which helps to significantly reduce search latencies and improve indexing throughput. -{: .important} - -Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`. - -To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index: - -### Example: HNSW +## Method definitions -The following example creates a byte vector index with the `lucene` engine and `hnsw` algorithm: +[Method definitions]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) are used when the underlying [approximate k-NN (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) algorithm does not require training. For example, the following `knn_vector` field specifies that a Faiss implementation of HNSW should be used for ANN search. During indexing, Faiss builds the corresponding HNSW segment files: ```json PUT test-index @@ -202,14 +82,13 @@ PUT test-index }, "mappings": { "properties": { - "my_vector": { + "my_vector1": { "type": "knn_vector", - "dimension": 3, - "data_type": "byte", - "space_type": "l2", + "dimension": 1024, "method": { "name": "hnsw", - "engine": "lucene", + "space_type": "l2", + "engine": "faiss", "parameters": { "ef_construction": 100, "m": 16 @@ -222,687 +101,79 @@ PUT test-index ``` {% include copy-curl.html %} -After creating the index, ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range: +You can also specify the `space_type` at the top level: ```json -PUT test-index/_doc/1 -{ - "my_vector": [-126, 28, 127] -} -``` -{% include copy-curl.html %} - -```json -PUT test-index/_doc/2 -{ - "my_vector": [100, -128, 0] -} -``` -{% include copy-curl.html %} - -When querying, be sure to use a `byte` vector: - -```json -GET test-index/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector": { - "vector": [26, -120, 99], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} - -### Example: IVF - -The `ivf` method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). - -First, create an index that will contain byte vector training data. Specify the `faiss` engine and `ivf` algorithm and make sure that the `dimension` matches the dimension of the model you want to create: - -```json -PUT train-index -{ - "mappings": { - "properties": { - "train-field": { - "type": "knn_vector", - "dimension": 4, - "data_type": "byte" - } - } - } -} -``` -{% include copy-curl.html %} - -First, ingest training data containing byte vectors into the training index: - -```json -PUT _bulk -{ "index": { "_index": "train-index", "_id": "1" } } -{ "train-field": [127, 100, 0, -120] } -{ "index": { "_index": "train-index", "_id": "2" } } -{ "train-field": [2, -128, -10, 50] } -{ "index": { "_index": "train-index", "_id": "3" } } -{ "train-field": [13, -100, 5, 126] } -{ "index": { "_index": "train-index", "_id": "4" } } -{ "train-field": [5, 100, -6, -125] } -``` -{% include copy-curl.html %} - -Then, create and train the model named `byte-vector-model`. The model will be trained using the training data from the `train-field` in the `train-index`. Specify the `byte` data type: - -```json -POST _plugins/_knn/models/byte-vector-model/_train -{ - "training_index": "train-index", - "training_field": "train-field", - "dimension": 4, - "description": "model with byte data", - "data_type": "byte", - "method": { - "name": "ivf", - "engine": "faiss", - "space_type": "l2", - "parameters": { - "nlist": 1, - "nprobes": 1 - } - } -} -``` -{% include copy-curl.html %} - -To check the model training status, call the Get Model API: - -```json -GET _plugins/_knn/models/byte-vector-model?filter_path=state -``` -{% include copy-curl.html %} - -Once the training is complete, the `state` changes to `created`. - -Next, create an index that will initialize its native library indexes using the trained model: - -```json -PUT test-byte-ivf -{ - "settings": { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector": { - "type": "knn_vector", - "model_id": "byte-vector-model" - } - } - } -} -``` -{% include copy-curl.html %} - -Ingest the data containing the byte vectors that you want to search into the created index: - -```json -PUT _bulk?refresh=true -{"index": {"_index": "test-byte-ivf", "_id": "1"}} -{"my_vector": [7, 10, 15, -120]} -{"index": {"_index": "test-byte-ivf", "_id": "2"}} -{"my_vector": [10, -100, 120, -108]} -{"index": {"_index": "test-byte-ivf", "_id": "3"}} -{"my_vector": [1, -2, 5, -50]} -{"index": {"_index": "test-byte-ivf", "_id": "4"}} -{"my_vector": [9, -7, 45, -78]} -{"index": {"_index": "test-byte-ivf", "_id": "5"}} -{"my_vector": [80, -70, 127, -128]} -``` -{% include copy-curl.html %} - -Finally, search the data. Be sure to provide a byte vector in the k-NN vector field: - -```json -GET test-byte-ivf/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector": { - "vector": [100, -120, 50, -45], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} - -### Memory estimation - -In the best-case scenario, byte vectors require 25% of the memory required by 32-bit vectors. - -#### HNSW memory estimation - -The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. - -As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The memory requirement can be estimated as follows: - -```r -1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.39 GB -``` - -#### IVF memory estimation - -The memory required for IVF is estimated to be `1.1 * ((dimension * num_vectors) + (4 * nlist * dimension))` bytes/vector, where `nlist` is the number of buckets to partition vectors into. - -As an example, assume that you have 1 million vectors with a dimension of 256 and an `nlist` of 128. The memory requirement can be estimated as follows: - -```r -1.1 * ((256 * 1,000,000) + (4 * 128 * 256)) ~= 0.27 GB -``` - - -### Quantization techniques - -If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only. - -#### Scalar quantization for the L2 space type - -The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$). - -```python -# Random dataset (Example to create a random dataset) -dataset = np.random.uniform(-300, 300, (100, 10)) -# Random query set (Example to create a random queryset) -queryset = np.random.uniform(-350, 350, (100, 10)) -# Number of values -B = 256 - -# INDEXING: -# Get min and max -dataset_min = np.min(dataset) -dataset_max = np.max(dataset) -# Shift coordinates to be non-negative -dataset -= dataset_min -# Normalize into [0, 1] -dataset *= 1. / (dataset_max - dataset_min) -# Bucket into 256 values -dataset = np.floor(dataset * (B - 1)) - int(B / 2) - -# QUERYING: -# Clip (if queryset range is out of datset range) -queryset = queryset.clip(dataset_min, dataset_max) -# Shift coordinates to be non-negative -queryset -= dataset_min -# Normalize -queryset *= 1. / (dataset_max - dataset_min) -# Bucket into 256 values -queryset = np.floor(queryset * (B - 1)) - int(B / 2) -``` -{% include copy.html %} - -#### Scalar quantization for the cosine similarity space type - -The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$). - -The following pseudocode is for positive numbers: - -```python -# For Positive Numbers - -# INDEXING and QUERYING: - -# Get Max of train dataset -max = np.max(dataset) -min = 0 -B = 127 - -# Normalize into [0,1] -val = (val - min) / (max - min) -val = (val * B) - -# Get int and fraction values -int_part = floor(val) -frac_part = val - int_part - -if 0.5 < frac_part: - bval = int_part + 1 -else: - bval = int_part - -return Byte(bval) -``` -{% include copy.html %} - -The following pseudocode is for negative numbers: - -```python -# For Negative Numbers - -# INDEXING and QUERYING: - -# Get Min of train dataset -min = 0 -max = -np.min(dataset) -B = 128 - -# Normalize into [0,1] -val = (val - min) / (max - min) -val = (val * B) - -# Get int and fraction values -int_part = floor(var) -frac_part = val - int_part - -if 0.5 < frac_part: - bval = int_part + 1 -else: - bval = int_part - -return Byte(bval) -``` -{% include copy.html %} - -## Binary vectors - -You can reduce memory costs by a factor of 32 by switching from float to binary vectors. -Using binary vector indexes can lower operational costs while maintaining high recall performance, making large-scale deployment more economical and efficient. - -Binary format is available for the following k-NN search types: - -- [Approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/): Supports binary vectors only for the Faiss engine with the HNSW and IVF algorithms. -- [Script score k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/): Enables the use of binary vectors in script scoring. -- [Painless extensions]({{site.url}}{{site.baseurl}}/search-plugins/knn/painless-functions/): Allows the use of binary vectors with Painless scripting extensions. - -### Requirements - -There are several requirements for using binary vectors in the OpenSearch k-NN plugin: - -- The `data_type` of the binary vector index must be `binary`. -- The `space_type` of the binary vector index must be `hamming`. -- The `dimension` of the binary vector index must be a multiple of 8. -- You must convert your binary data into 8-bit signed integers (`int8`) in the [-128, 127] range. For example, the binary sequence of 8 bits `0, 1, 1, 0, 0, 0, 1, 1` must be converted into its equivalent byte value of `99` to be used as a binary vector input. - -### Example: HNSW - -To create a binary vector index with the Faiss engine and HNSW algorithm, send the following request: - -```json -PUT /test-binary-hnsw +PUT test-index { "settings": { "index": { - "knn": true + "knn": true, + "knn.algo_param.ef_search": 100 } }, "mappings": { "properties": { - "my_vector": { + "my_vector1": { "type": "knn_vector", - "dimension": 8, - "data_type": "binary", - "space_type": "hamming", + "dimension": 1024, + "space_type": "l2", "method": { "name": "hnsw", - "engine": "faiss" - } - } - } - } -} -``` -{% include copy-curl.html %} - -Then ingest some documents containing binary vectors: - -```json -PUT _bulk -{"index": {"_index": "test-binary-hnsw", "_id": "1"}} -{"my_vector": [7], "price": 4.4} -{"index": {"_index": "test-binary-hnsw", "_id": "2"}} -{"my_vector": [10], "price": 14.2} -{"index": {"_index": "test-binary-hnsw", "_id": "3"}} -{"my_vector": [15], "price": 19.1} -{"index": {"_index": "test-binary-hnsw", "_id": "4"}} -{"my_vector": [99], "price": 1.2} -{"index": {"_index": "test-binary-hnsw", "_id": "5"}} -{"my_vector": [80], "price": 16.5} -``` -{% include copy-curl.html %} - -When querying, be sure to use a binary vector: - -```json -GET /test-binary-hnsw/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector": { - "vector": [9], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains the two vectors closest to the query vector: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 8, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "test-binary-hnsw", - "_id": "2", - "_score": 0.5, - "_source": { - "my_vector": [ - 10 - ], - "price": 14.2 - } - }, - { - "_index": "test-binary-hnsw", - "_id": "5", - "_score": 0.25, - "_source": { - "my_vector": [ - 80 - ], - "price": 16.5 + "engine": "faiss", + "parameters": { + "ef_construction": 100, + "m": 16 + } } } - ] - } -} -``` -
- -### Example: IVF - -The IVF method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). - -First, create an index that will contain binary vector training data. Specify the Faiss engine and IVF algorithm and make sure that the `dimension` matches the dimension of the model you want to create: - -```json -PUT train-index -{ - "mappings": { - "properties": { - "train-field": { - "type": "knn_vector", - "dimension": 8, - "data_type": "binary" - } } } } ``` {% include copy-curl.html %} -Ingest training data containing binary vectors into the training index: - -
- - Bulk ingest request - - {: .text-delta} - -```json -PUT _bulk -{ "index": { "_index": "train-index", "_id": "1" } } -{ "train-field": [1] } -{ "index": { "_index": "train-index", "_id": "2" } } -{ "train-field": [2] } -{ "index": { "_index": "train-index", "_id": "3" } } -{ "train-field": [3] } -{ "index": { "_index": "train-index", "_id": "4" } } -{ "train-field": [4] } -{ "index": { "_index": "train-index", "_id": "5" } } -{ "train-field": [5] } -{ "index": { "_index": "train-index", "_id": "6" } } -{ "train-field": [6] } -{ "index": { "_index": "train-index", "_id": "7" } } -{ "train-field": [7] } -{ "index": { "_index": "train-index", "_id": "8" } } -{ "train-field": [8] } -{ "index": { "_index": "train-index", "_id": "9" } } -{ "train-field": [9] } -{ "index": { "_index": "train-index", "_id": "10" } } -{ "train-field": [10] } -{ "index": { "_index": "train-index", "_id": "11" } } -{ "train-field": [11] } -{ "index": { "_index": "train-index", "_id": "12" } } -{ "train-field": [12] } -{ "index": { "_index": "train-index", "_id": "13" } } -{ "train-field": [13] } -{ "index": { "_index": "train-index", "_id": "14" } } -{ "train-field": [14] } -{ "index": { "_index": "train-index", "_id": "15" } } -{ "train-field": [15] } -{ "index": { "_index": "train-index", "_id": "16" } } -{ "train-field": [16] } -{ "index": { "_index": "train-index", "_id": "17" } } -{ "train-field": [17] } -{ "index": { "_index": "train-index", "_id": "18" } } -{ "train-field": [18] } -{ "index": { "_index": "train-index", "_id": "19" } } -{ "train-field": [19] } -{ "index": { "_index": "train-index", "_id": "20" } } -{ "train-field": [20] } -{ "index": { "_index": "train-index", "_id": "21" } } -{ "train-field": [21] } -{ "index": { "_index": "train-index", "_id": "22" } } -{ "train-field": [22] } -{ "index": { "_index": "train-index", "_id": "23" } } -{ "train-field": [23] } -{ "index": { "_index": "train-index", "_id": "24" } } -{ "train-field": [24] } -{ "index": { "_index": "train-index", "_id": "25" } } -{ "train-field": [25] } -{ "index": { "_index": "train-index", "_id": "26" } } -{ "train-field": [26] } -{ "index": { "_index": "train-index", "_id": "27" } } -{ "train-field": [27] } -{ "index": { "_index": "train-index", "_id": "28" } } -{ "train-field": [28] } -{ "index": { "_index": "train-index", "_id": "29" } } -{ "train-field": [29] } -{ "index": { "_index": "train-index", "_id": "30" } } -{ "train-field": [30] } -{ "index": { "_index": "train-index", "_id": "31" } } -{ "train-field": [31] } -{ "index": { "_index": "train-index", "_id": "32" } } -{ "train-field": [32] } -{ "index": { "_index": "train-index", "_id": "33" } } -{ "train-field": [33] } -{ "index": { "_index": "train-index", "_id": "34" } } -{ "train-field": [34] } -{ "index": { "_index": "train-index", "_id": "35" } } -{ "train-field": [35] } -{ "index": { "_index": "train-index", "_id": "36" } } -{ "train-field": [36] } -{ "index": { "_index": "train-index", "_id": "37" } } -{ "train-field": [37] } -{ "index": { "_index": "train-index", "_id": "38" } } -{ "train-field": [38] } -{ "index": { "_index": "train-index", "_id": "39" } } -{ "train-field": [39] } -{ "index": { "_index": "train-index", "_id": "40" } } -{ "train-field": [40] } -``` -{% include copy-curl.html %} -
+## Model IDs -Then, create and train the model named `test-binary-model`. The model will be trained using the training data from the `train_field` in the `train-index`. Specify the `binary` data type and `hamming` space type: +Model IDs are used when the underlying ANN algorithm requires a training step. As a prerequisite, the model must be created using the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model). The model contains the information needed to initialize the native library segment files. To configure a model for a vector field, specify the `model_id`: ```json -POST _plugins/_knn/models/test-binary-model/_train -{ - "training_index": "train-index", - "training_field": "train-field", - "dimension": 8, - "description": "model with binary data", - "data_type": "binary", - "space_type": "hamming", - "method": { - "name": "ivf", - "engine": "faiss", - "parameters": { - "nlist": 16, - "nprobes": 1 - } - } +"my_vector": { + "type": "knn_vector", + "model_id": "my-model" } ``` -{% include copy-curl.html %} - -To check the model training status, call the Get Model API: - -```json -GET _plugins/_knn/models/test-binary-model?filter_path=state -``` -{% include copy-curl.html %} - -Once the training is complete, the `state` changes to `created`. -Next, create an index that will initialize its native library indexes using the trained model: +However, if you intend to use Painless scripting or a k-NN score script, you only need to pass the `dimension`: ```json -PUT test-binary-ivf -{ - "settings": { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector": { - "type": "knn_vector", - "model_id": "test-binary-model" - } - } - } -} +"my_vector": { + "type": "knn_vector", + "dimension": 128 + } ``` -{% include copy-curl.html %} -Ingest the data containing the binary vectors that you want to search into the created index: +For more information, see [Building a vector index from a model]({{site.url}}{{site.baseurl}}/vector-search/vector-search-techniques/approximate-knn/#building-a-vector-index-from-a-model). -```json -PUT _bulk?refresh=true -{"index": {"_index": "test-binary-ivf", "_id": "1"}} -{"my_vector": [7], "price": 4.4} -{"index": {"_index": "test-binary-ivf", "_id": "2"}} -{"my_vector": [10], "price": 14.2} -{"index": {"_index": "test-binary-ivf", "_id": "3"}} -{"my_vector": [15], "price": 19.1} -{"index": {"_index": "test-binary-ivf", "_id": "4"}} -{"my_vector": [99], "price": 1.2} -{"index": {"_index": "test-binary-ivf", "_id": "5"}} -{"my_vector": [80], "price": 16.5} -``` -{% include copy-curl.html %} - -Finally, search the data. Be sure to provide a binary vector in the k-NN vector field: +### Parameters -```json -GET test-binary-ivf/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector": { - "vector": [8], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} +The following table lists the parameters accepted by k-NN vector field types. -The response contains the two vectors closest to the query vector: +Parameter | Data type | Description +:--- | :--- +`type` | String | The vector field type. Must be `knn_vector`. Required. +`dimension` | Integer | The size of the vectors used. Valid values are in the [1, 16,000] range. Required. +`data_type` | String | The data type of the vector elements. Valid values are `binary`, `byte`, and `float`. Optional. Default is `float`. +`space_type` | String | The vector space used to calculate the distance between vectors. Valid values are `l1`, `l2`, `linf`, `cosinesimil`, `innerproduct`, `hamming`, and `hammingbit`. Not every method/engine combination supports each of the spaces. For a list of supported spaces, see the section for a specific engine. Note: This value can also be specified within the `method`. Optional. For more information, see [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/). +`mode` | String | Sets appropriate default values for k-NN parameters based on your priority: either low latency or low cost. Valid values are `in_memory` and `on_disk`. Optional. Default is `in_memory`. For more information, see [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/). +`compression_level` | String | Selects a quantization encoder that reduces vector memory consumption by the given factor. Valid values are `1x`, `2x`, `4x`, `8x`, `16x`, and `32x`. Optional. For more information, see [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/). +`method` | Object | The algorithm used for organizing vector data at indexing time and searching it at search time. Used when the ANN algorithm does not require training. Optional. For more information, see [Methods and engines]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/). +`model_id` | String | The model ID of a trained model. Used when the ANN algorithm requires training. See [Model IDs](#model-ids). Optional. -
- - Response - - {: .text-delta} +## Next steps -```json -GET /_plugins/_knn/models/my-model?filter_path=state -{ - "took": 7, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "test-binary-ivf", - "_id": "2", - "_score": 0.5, - "_source": { - "my_vector": [ - 10 - ], - "price": 14.2 - } - }, - { - "_index": "test-binary-ivf", - "_id": "3", - "_score": 0.25, - "_source": { - "my_vector": [ - 15 - ], - "price": 19.1 - } - } - ] - } -} -``` -
+- [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/) +- [Methods and engines]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) +- [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/) +- [Vector search]({{site.url}}{{site.baseurl}}/vector-search/) +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) \ No newline at end of file diff --git a/_includes/cards.html b/_includes/cards.html index 2e47360704..61a0a6c2a0 100644 --- a/_includes/cards.html +++ b/_includes/cards.html @@ -1,43 +1,24 @@
-

Explore OpenSearch documentation

-
-
- -

OpenSearch and OpenSearch Dashboards

-

Build your OpenSearch solution using core tooling and visualizations

- -
- - -
- -

OpenSearch Data Prepper

-

Filter, mutate, and sample your data for ingestion into OpenSearch

- -
- -
- -

Clients

-

Interact with OpenSearch from your application using language APIs

- -
- - -
- -

OpenSearch Benchmark

-

Measure performance metrics for your OpenSearch cluster

- -
- -
- -

Migration Assistant

-

Migrate to OpenSearch

- -
+
+ {% for card in include.cards %} +
+ +

{{ card.heading }}

+ {% if card.description %} +

{{ card.description }}

+ {% endif %} + {% if card.list %} +
    + {% for item in card.list %} +
  • {{ item }}
  • + {% endfor %} +
+ {% endif %} + {% if include.documentation_link %} + + {% endif %} +
+ {% endfor %} +
- -
- + \ No newline at end of file diff --git a/_includes/home_cards.html b/_includes/home_cards.html new file mode 100644 index 0000000000..fb3639005a --- /dev/null +++ b/_includes/home_cards.html @@ -0,0 +1,72 @@ +
+

OpenSearch and OpenSearch Dashboards

+
+
+ +

All documentation

+

Build your OpenSearch solution using core tooling and visualizations.

+ +
+ + +
+ +

Vector search

+

Use vector database capabilities for more relevant search results.

+ +
+ +
+ +

Machine learning

+

Power your applications with machine learning model integration.

+ +
+ + +
+ +

OpenSearch Dashboards

+

Explore and visualize your data using interactive dashboards.

+ +
+
+ +
+ +
+

Supporting tools

+
+ +
+ +

Data Prepper

+

Filter, mutate, and sample your data for ingestion into OpenSearch.

+ +
+ +
+ +

Clients

+

Interact with OpenSearch from your application using language APIs.

+ +
+ + +
+ +

OpenSearch Benchmark

+

Measure OpenSearch cluster performance metrics.

+ +
+ +
+ +

Migration Assistant

+

Migrate to OpenSearch.

+ +
+
+ +
+ diff --git a/_includes/list.html b/_includes/list.html new file mode 100644 index 0000000000..c32fcdd0c5 --- /dev/null +++ b/_includes/list.html @@ -0,0 +1,22 @@ +
+ {% if include.list_title %} +
{{ include.list_title }}
+ {% endif %} + {% assign counter = 0 %} + {% for item in include.list_items %} + {% assign counter = counter | plus: 1 %} +
+
{{ counter }}
+
+
+ {% if item.link %} + {{ item.heading }} + {% else %} + {{ item.heading }} + {% endif %} +
+

{{ item.description | markdownify }}

+
+
+ {% endfor %} +
diff --git a/_ml-commons-plugin/custom-local-models.md b/_ml-commons-plugin/custom-local-models.md index 09c3105f8d..229c23ad1c 100644 --- a/_ml-commons-plugin/custom-local-models.md +++ b/_ml-commons-plugin/custom-local-models.md @@ -320,7 +320,7 @@ The response contains the tokens and weights: ## Step 5: Use the model for search -To learn how to use the model for vector search, see [Using an ML model for neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/#using-an-ml-model-for-neural-search). +To learn how to use the model for vector search, see [AI search methods]({{site.url}}{{site.baseurl}}/vector-search/ai-search/#ai-search-methods). ## Question answering models diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index 50d637379e..f18813383d 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -34,7 +34,7 @@ ML Commons provides its own set of REST APIs. For more information, see [ML Comm ## ML-powered search -For information about available ML-powered search types, see [ML-powered search]({{site.url}}{{site.baseurl}}/search-plugins/index/#ml-powered-search). +For information about available ML-powered search types, see [Vector search]({{site.url}}{{site.baseurl}}/vector-search/). ## Tutorials diff --git a/_ml-commons-plugin/pretrained-models.md b/_ml-commons-plugin/pretrained-models.md index 552e3e607e..6bfc9c803d 100644 --- a/_ml-commons-plugin/pretrained-models.md +++ b/_ml-commons-plugin/pretrained-models.md @@ -52,7 +52,7 @@ We recommend the following combinations for optimal performance: - Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. -For more information about the preceding options for running neural sparse search, see [Generating sparse vector embeddings within OpenSearch]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). +For more information about the preceding options for running neural sparse search, see [Generating sparse vector embeddings automatically]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). The following table provides a list of sparse encoding models and artifact links you can use to download them. diff --git a/_ml-commons-plugin/remote-models/guardrails.md b/_ml-commons-plugin/remote-models/guardrails.md index 6af45d7342..2a907fede7 100644 --- a/_ml-commons-plugin/remote-models/guardrails.md +++ b/_ml-commons-plugin/remote-models/guardrails.md @@ -637,4 +637,4 @@ OpenSearch responds with an error. ## Next steps - For more information about configuring guardrails, see [The `guardrails` parameter]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/#the-guardrails-parameter). -- For a tutorial demonstrating how to use Amazon Bedrock guardrails, see [Using Amazon Bedrock guardrails]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/model-controls/bedrock-guardrails/) \ No newline at end of file +- For a tutorial demonstrating how to use Amazon Bedrock guardrails, see [Using Amazon Bedrock guardrails]({{site.url}}{{site.baseurl}}/vector-search/tutorials/model-controls/bedrock-guardrails/). \ No newline at end of file diff --git a/_ml-commons-plugin/remote-models/index.md b/_ml-commons-plugin/remote-models/index.md index ddde42ecec..1eef82f958 100644 --- a/_ml-commons-plugin/remote-models/index.md +++ b/_ml-commons-plugin/remote-models/index.md @@ -323,7 +323,7 @@ To learn how to use the model for batch ingestion in order to improve ingestion ## Step 7: Use the model for search -To learn how to use the model for vector search, see [Using an ML model for neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/#using-an-ml-model-for-neural-search). +To learn how to use the model for vector search, see [AI search methods]({{site.url}}{{site.baseurl}}/vector-search/ai-search/#ai-search-methods). ## Step 8 (Optional): Undeploy the model diff --git a/_ml-commons-plugin/tutorials/chatbots/index.md b/_ml-commons-plugin/tutorials/chatbots/index.md deleted file mode 100644 index 3920ab4576..0000000000 --- a/_ml-commons-plugin/tutorials/chatbots/index.md +++ /dev/null @@ -1,28 +0,0 @@ ---- -layout: default -title: Chatbots and agents -parent: Tutorials -has_children: true -has_toc: false -nav_order: 140 ---- - -# Chatbots and agents tutorials - -The following machine learning (ML) tutorials show you how to implement chatbots and agents: - -- [**RAG chatbot**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/chatbots/rag-chatbot/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock - -- [**RAG with a conversational flow agent**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/chatbots/rag-conversational-agent/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock - -- [**Build your own chatbot**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/chatbots/build-chatbot/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock - \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/conversational-search/index.md b/_ml-commons-plugin/tutorials/conversational-search/index.md deleted file mode 100644 index 6ad35e0868..0000000000 --- a/_ml-commons-plugin/tutorials/conversational-search/index.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -layout: default -title: Conversational search -parent: Tutorials -has_children: true -has_toc: false -nav_order: 80 ---- - -# Conversational search tutorials - -The following tutorials show you how to implement conversational search: - -- [**Conversational search using Cohere Command**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/conversational-search/conversational-search-cohere/) - - Platform: OpenSearch - - Model: Cohere - - Deployment: Provider API \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/index.md b/_ml-commons-plugin/tutorials/index.md deleted file mode 100644 index 8f7c879a3f..0000000000 --- a/_ml-commons-plugin/tutorials/index.md +++ /dev/null @@ -1,132 +0,0 @@ ---- -layout: default -title: Tutorials -has_children: true -has_toc: false -nav_order: 140 ---- - -# Tutorials - -Using the OpenSearch machine learning (ML) framework, you can build various applications, from implementing conversational search to building your own chatbot. To learn more, explore the following ML tutorials. - ---- - -## Vector operations -- [**Generating embeddings from arrays of objects**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/vector-operations/generate-embeddings/) - - Platform: OpenSearch - - Model: Amazon Titan - - Deployment: Amazon Bedrock - -- [**Semantic search using byte-quantized vectors**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/vector-operations/semantic-search-byte-vectors/) - - Platform: OpenSearch - - Model: Cohere - - Deployment: Provider API - ---- - -## Semantic search -- [**Semantic search using the OpenAI embedding model**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-openai/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: OpenAI - - Deployment: Provider API - -- [**Semantic search using Cohere Embed**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-cohere/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Cohere - - Deployment: Provider API - -- [**Semantic search using Cohere Embed on Amazon Bedrock**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-cohere/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Cohere - - Deployment: Amazon Bedrock - -- [**Semantic search using Amazon Bedrock Titan**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Amazon Titan - - Deployment: Amazon Bedrock - -- [**Semantic search using Amazon Bedrock Titan in another account**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan-other/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Amazon Titan - - Deployment: Amazon Bedrock (in a different account than your Amazon OpenSearch Service account) - -- [**Semantic search using a model on Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-sagemaker/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Custom - - Deployment: Amazon SageMaker - -- [**Semantic search using AWS CloudFormation and Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-cfn-sagemaker/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Custom - - Deployment: Amazon SageMaker + CloudFormation - ---- - -## Conversational search -- [**Conversational search using Cohere Command**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/conversational-search/conversational-search-cohere/) - - Platform: OpenSearch - - Model: Cohere - - Deployment: Provider API - ---- - -## Search result reranking -- [**Reranking search results using Cohere Rerank**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking/reranking-cohere/) - - Platform: OpenSearch - - Model: Cohere - - Deployment: Provider API - -- [**Reranking search results using Amazon Bedrock models**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking/reranking-bedrock/) - - Platform: OpenSearch - - Model: Amazon Bedrock reranker models - - Deployment: Amazon Bedrock - -- [**Reranking search results using a cross-encoder in Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking/reranking-cross-encoder/) - - Platform: OpenSearch - - Model: MS MARCO - - Deployment: Amazon SageMaker - ---- - -## RAG -- [**Retrieval-augmented generation (RAG) using the DeepSeek Chat API**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag/rag-deepseek-chat/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: DeepSeek - - Deployment: Provider API - -- [**RAG using DeepSeek-R1 on Amazon Bedrock**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag/rag-deepseek-r1-bedrock/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: DeepSeek - - Deployment: Amazon Bedrock - -- [**RAG using DeepSeek-R1 in Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag/rag-deepseek-r1-sagemaker/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: DeepSeek - - Deployment: Amazon SageMaker - ---- - -## Chatbots and agents -- [**RAG chatbot**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/chatbots/rag-chatbot/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock - -- [**RAG with a conversational flow agent**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/chatbots/rag-conversational-agent/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock - -- [**Build your own chatbot**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/chatbots/build-chatbot/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock - ---- - -## Model controls -- [**Amazon Bedrock guardrails**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/model-controls/bedrock-guardrails/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock diff --git a/_ml-commons-plugin/tutorials/model-controls/index.md b/_ml-commons-plugin/tutorials/model-controls/index.md deleted file mode 100644 index 10493928bd..0000000000 --- a/_ml-commons-plugin/tutorials/model-controls/index.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -layout: default -title: Model controls -parent: Tutorials -has_children: true -has_toc: false -nav_order: 80 ---- - -# Model controls tutorials - -The following tutorials show you how to implement model controls: - -- [**Amazon Bedrock guardrails**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/model-controls/bedrock-guardrails/) - - Platform: OpenSearch - - Model: Anthropic Claude - - Deployment: Amazon Bedrock \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/rag/index.md b/_ml-commons-plugin/tutorials/rag/index.md deleted file mode 100644 index 21ad495709..0000000000 --- a/_ml-commons-plugin/tutorials/rag/index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -layout: default -title: RAG -parent: Tutorials -has_children: true -has_toc: false -nav_order: 120 ---- - -# RAG tutorials - -The following machine learning (ML) tutorials show you how to implement retrieval-augmeted generation (RAG): - -- [**RAG using the DeepSeek Chat API**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag/rag-deepseek-chat/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: DeepSeek - - Deployment: Provider API - -- [**RAG using DeepSeek-R1 on Amazon Bedrock**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag/rag-deepseek-r1-bedrock/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: DeepSeek - - Deployment: Amazon Bedrock - -- [**RAG using DeepSeek-R1 in Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag/rag-deepseek-r1-sagemaker/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: DeepSeek - - Deployment: Amazon SageMaker \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/reranking/index.md b/_ml-commons-plugin/tutorials/reranking/index.md deleted file mode 100644 index 63c3254fa4..0000000000 --- a/_ml-commons-plugin/tutorials/reranking/index.md +++ /dev/null @@ -1,27 +0,0 @@ ---- -layout: default -title: Reranking search results -parent: Tutorials -has_children: true -has_toc: false -nav_order: 100 ---- - -# Reranking search results tutorials - -The following machine learning (ML) tutorials show you how to implement search result reranking: - -- [**Reranking search results using Cohere Rerank**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking/reranking-cohere/) - - Platform: OpenSearch - - Model: Cohere - - Deployment: Provider API - -- [**Reranking search results using Amazon Bedrock models**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking/reranking-bedrock/) - - Platform: OpenSearch - - Model: Amazon Bedrock reranker models - - Deployment: Amazon Bedrock - -- [**Reranking search results using a cross-encoder in Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking/reranking-cross-encoder/) - - Platform: OpenSearch - - Model: MS MARCO - - Deployment: Amazon SageMaker \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/semantic-search/index.md b/_ml-commons-plugin/tutorials/semantic-search/index.md deleted file mode 100644 index 5918da841b..0000000000 --- a/_ml-commons-plugin/tutorials/semantic-search/index.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -layout: default -title: Semantic search -parent: Tutorials -has_children: true -has_toc: false -nav_order: 50 ---- - -# Semantic search tutorials - -The following tutorials show you how to implement semantic search: - -- [**Semantic search using the OpenAI embedding model**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-openai/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: OpenAI - - Deployment: Provider API - -- [**Semantic search using Cohere Embed**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-cohere/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Cohere - - Deployment: Provider API - -- [**Semantic search using Cohere Embed on Amazon Bedrock**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-cohere/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Cohere - - Deployment: Amazon Bedrock - -- [**Semantic search using Amazon Bedrock Titan**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Amazon Titan - - Deployment: Amazon Bedrock - -- [**Semantic search using Amazon Bedrock Titan in another account**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan-other/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Amazon Titan - - Deployment: Amazon Bedrock (in a different account than your Amazon OpenSearch Service account) - -- [**Semantic search using a model in Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-sagemaker/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Custom - - Deployment: Amazon SageMaker - -- [**Semantic search using AWS CloudFormation and Amazon SageMaker**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-cfn-sagemaker/) - - Platform: OpenSearch, Amazon OpenSearch Service - - Model: Custom - - Deployment: Amazon SageMaker + CloudFormation - -- [**Semantic search using byte-quantized vectors**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/vector-operations/semantic-search-byte-vectors/) - - Platform: OpenSearch - - Model: Cohere - - Deployment: Provider API \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/vector-operations/index.md b/_ml-commons-plugin/tutorials/vector-operations/index.md deleted file mode 100644 index fd893a36ed..0000000000 --- a/_ml-commons-plugin/tutorials/vector-operations/index.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -layout: default -title: Vector operations -parent: Tutorials -has_children: true -has_toc: false -nav_order: 10 ---- - -# Vector operation tutorials - -The following tutorials show you how to implement vector operations: - -- [**Generating embeddings from arrays of objects**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/vector-operations/generate-embeddings/) - - Platform: OpenSearch - - Model: Amazon Titan - - Deployment: Amazon Bedrock - -- [**Semantic search using byte-quantized vectors**]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/vector-operations/semantic-search-byte-vectors/) - - Platform: OpenSearch - - Model: Cohere - - Deployment: Provider API \ No newline at end of file diff --git a/_query-dsl/specialized/index.md b/_query-dsl/specialized/index.md index 8a4cd81af6..d28451cfa8 100644 --- a/_query-dsl/specialized/index.md +++ b/_query-dsl/specialized/index.md @@ -14,7 +14,9 @@ OpenSearch supports the following specialized queries: - `more_like_this`: Finds documents similar to the provided text, document, or collection of documents. -- [`neural`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/): Used for vector field search in [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/). +- [`knn`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/): Used for searching raw vectors during [vector search]({{site.url}}{{site.baseurl}}/vector-search/). + +- [`neural`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/): Used for searching by text or image in [vector search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/). - [`neural_sparse`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/): Used for vector field search in [sparse neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). diff --git a/_query-dsl/specialized/k-nn.md b/_query-dsl/specialized/k-nn.md new file mode 100644 index 0000000000..4d30ad235b --- /dev/null +++ b/_query-dsl/specialized/k-nn.md @@ -0,0 +1,209 @@ +--- +layout: default +title: k-NN +parent: Specialized queries +nav_order: 10 +--- + +# k-NN query + +Use the `knn` query for running nearest neighbor searches on vector fields. + +## Request body fields + +Provide a vector field in the `knn` query and specify additional request fields in the vector field object: + +```json +"knn": { + "": { + "vector": [], + "k": , + ... + } +} +``` + +The top-level `vector_field` specifies the vector field against which to run a search query. The following table lists all supported request fields. + +Field | Data type | Required/Optional | Description +:--- | :--- | :--- | :--- +`vector` | Array of floats or bytes | Required | The query vector to use for vector search. The data type of the vector elements must match the data type of vectors indexed in the [`knn_vector` field]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) searched. +`k` | Integer | Optional | The number of nearest neighbors to return. Valid values are in the [1, 10,000] range. Required if either `max_distance` or `min_score` is not specified. +`max_distance` | Float | Optional | The maximum distance threshold for search results. Only one of `k`, `max_distance`, or `min_score` can be specified. For more information, see [Radial search]({{site.url}}{{site.baseurl}}/vector-search/specialized-operations/radial-search-knn/). +`min_score` | Float | Optional | The minimum score threshold for search results. Only one of `k`, `max_distance`, or `min_score` can be specified. For more information, see [Radial search]({{site.url}}{{site.baseurl}}/vector-search/specialized-operations/radial-search-knn/). +`filter` | Object | Optional | A filter to apply to the k-NN search. For more information, see [Vector search with filters]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/). **Important**: A filter can only be used with the `faiss` or `lucene` engines. +`method_parameters` | Object | Optional | Additional parameters for fine-tuning the search:
- `ef_search` (Integer): The number of vectors to examine (for the `hnsw` method)
- `nprobes` (Integer): The number of buckets to examine (for the `ivf` method). For more information, see [Specifying method parameters in the query](#specifying-method-parameters-in-the-query). +`rescore` | Object or Boolean | Optional | Parameters for configuring rescoring functionality:
- `oversample_factor` (Float): Controls the oversampling of candidate vectors before ranking. Valid values are in the `[1.0, 100.0]` range. Default is `1.0` (no rescoring). To use the default `oversample_factor` of `1.0`, set `rescore` to `true`. For more information, see [Rescoring results](#rescoring-results). +`expand_nested_docs` | Boolean | Optional | When `true`, retrieves scores for all nested field documents within each parent document. Used with nested queries. For more information, see [Vector search with nested fields]({{site.url}}{{site.baseurl}}/vector-search/specialized-operations/nested-search-knn/). + +## Example request + +```json +GET /my-vector-index/_search +{ + "query": { + "knn": { + "my_vector": { + "vector": [1.5, 2.5], + "k": 3 + } + } + } +} +``` +{% include copy-curl.html %} + +## Example request: Nested fields + +```json +GET /my-vector-index/_search +{ + "_source": false, + "query": { + "nested": { + "path": "nested_field", + "query": { + "knn": { + "nested_field.my_vector": { + "vector": [1,1,1], + "k": 2, + "expand_nested_docs": true + } + } + }, + "inner_hits": { + "_source": false, + "fields":["nested_field.color"] + }, + "score_mode": "max" + } + } +} +``` +{% include copy-curl.html %} + +## Example request: Radial search with max_distance + +The following example shows a radial search performed with `max_distance`: + +```json +GET /my-vector-index/_search +{ + "query": { + "knn": { + "my_vector": { + "vector": [ + 7.1, + 8.3 + ], + "max_distance": 2 + } + } + } +} +``` +{% include copy-curl.html %} + + +## Example request: Radial search with min_score + +The following example shows a radial search performed with `min_score`: + +```json +GET /my-vector-index/_search +{ + "query": { + "knn": { + "my_vector": { + "vector": [7.1, 8.3], + "min_score": 0.95 + } + } + } +} +``` +{% include copy-curl.html %} + +## Specifying method parameters in the query + +Starting with version 2.16, you can provide `method_parameters` in a search request: + +```json +GET /my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "target-field": { + "vector": [2, 3, 5, 6], + "k": 2, + "method_parameters" : { + "ef_search": 100 + } + } + } + } +} +``` +{% include copy-curl.html %} + +These parameters are dependent on the combination of engine and method used to create the index. The following sections provide information about the supported `method_parameters`. + +### ef_search + +You can provide the `ef_search` parameter when searching an index created using the `hnsw` method. The `ef_search` parameter specifies the number of vectors to examine in order to find the top k nearest neighbors. Higher `ef_search` values improve recall at the cost of increased search latency. The value must be positive. + +The following table provides information about the `ef_search` parameter for the supported engines. + +Engine | Radial query support | Notes +:--- | :--- | :--- +`nmslib` (Deprecated) | No | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. +`faiss` | Yes | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. +`lucene` | No | When creating a search query, you must specify `k`. If you provide both `k` and `ef_search`, then the larger value is passed to the engine. If `ef_search` is larger than `k`, you can provide the `size` parameter to limit the final number of results to `k`. + + +### nprobes + + +You can provide the `nprobes` parameter when searching an index created using the `ivf` method. The `nprobes` parameter specifies the number of buckets to examine in order to find the top k nearest neighbors. Higher `nprobes` values improve recall at the cost of increased search latency. The value must be positive. + +The following table provides information about the `nprobes` parameter for the supported engines. + +Engine | Notes +:--- | :--- +`faiss` | If `nprobes` is present in a query, it overrides the value provided when creating the index. + +## Rescoring results + +You can fine-tune search by providing the `ef_search` and `oversample_factor` parameters. + +The `oversample_factor` parameter controls the factor by which the search oversamples the candidate vectors before ranking them. Using a higher oversample factor means that more candidates will be considered before ranking, improving accuracy but also increasing search time. When selecting the `oversample_factor` value, consider the trade-off between accuracy and efficiency. For example, setting the `oversample_factor` to `2.0` will double the number of candidates considered during the ranking phase, which may help achieve better results. + +The following request specifies the `ef_search` and `oversample_factor` parameters: + +```json +GET /my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector_field": { + "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], + "k": 10, + "method_parameters": { + "ef_search": 10 + }, + "rescore": { + "oversample_factor": 10.0 + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Next steps + +- [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) +- [Rescoring quantized results to full precision]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#rescoring-quantized-results-to-full-precision) \ No newline at end of file diff --git a/_query-dsl/specialized/neural.md b/_query-dsl/specialized/neural.md index ae9e1f2ea4..bc3d832e4b 100644 --- a/_query-dsl/specialized/neural.md +++ b/_query-dsl/specialized/neural.md @@ -7,7 +7,7 @@ nav_order: 50 # Neural query -Use the `neural` query for vector field search in [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/). +Use the `neural` query for vector field search by text or image in [vector search]({{site.url}}{{site.baseurl}}/vector-search/). ## Request body fields @@ -32,11 +32,9 @@ Field | Data type | Required/Optional | Description `query_image` | String | Optional | A base-64 encoded string that corresponds to the query image from which to generate vector embeddings. You must specify at least one `query_text` or `query_image`. `model_id` | String | Required if the default model ID is not set. For more information, see [Setting a default model on an index or field]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/#setting-a-default-model-on-an-index-or-field). | The ID of the model that will be used to generate vector embeddings from the query text. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/). `k` | Integer | Optional | The number of results returned by the k-NN search. Only one variable, either `k`, `min_score`, or `max_distance`, can be specified. If a variable is not specified, the default is `k` with a value of `10`. -`min_score` | Float | Optional | The minimum score threshold for the search results. Only one variable, either `k`, `min_score`, or `max_distance`, can be specified. For more information, see [k-NN radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). -`max_distance` | Float | Optional | The maximum distance threshold for the search results. Only one variable, either `k`, `min_score`, or `max_distance`, can be specified. For more information, see [k-NN radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). -`filter` | Object | Optional | A query that can be used to reduce the number of documents considered. For more information about filter usage, see [k-NN search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). **Important**: Filter can only be used with the `faiss` or `lucene` engines. -`method_parameters` | Object | Optional | Parameters passed to the k-NN index during search. See [Additional query parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#additional-query-parameters). -`rescore` | Object | Optional | Parameters for configuring rescoring functionality for k-NN indexes built using quantization. See [Rescoring]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#rescoring-quantized-results-using-full-precision). +`min_score` | Float | Optional | The minimum score threshold for the search results. Only one variable, either `k`, `min_score`, or `max_distance`, can be specified. For more information, see [Radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). +`max_distance` | Float | Optional | The maximum distance threshold for the search results. Only one variable, either `k`, `min_score`, or `max_distance`, can be specified. For more information, see [Radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). +`filter` | Object | Optional | A query that can be used to reduce the number of documents considered. For more information about filter usage, see [Vector search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). #### Example request diff --git a/_sass/_home.scss b/_sass/_home.scss index 9b5dd864a9..0a3d1f7dac 100644 --- a/_sass/_home.scss +++ b/_sass/_home.scss @@ -22,11 +22,16 @@ // Card style -.card-container-wrapper { +.home-card-container-wrapper { @include gradient-open-sky; + margin-bottom: 2rem; } -.card-container { +.card-container-wrapper { + margin-bottom: 0; +} + +.home-card-container { display: grid; grid-template-columns: 1fr; margin: 0 auto; @@ -42,11 +47,27 @@ } } -.card { +.card-container { + display: grid; + grid-template-columns: 1fr; + margin: 0 auto; + padding: 2rem 0; + grid-row-gap: 1rem; + grid-column-gap: 1rem; + grid-auto-rows: 1fr; + @include mq(md) { + grid-template-columns: repeat(1, 1fr); + } + @include mq(lg) { + grid-template-columns: repeat(2, 1fr); + } +} + +.home-card { @extend .panel; @include thick-edge-left; padding: 1rem; - margin-bottom: 4rem; + margin-bottom: 2rem; text-align: left; background-color: white; display: flex; @@ -67,9 +88,9 @@ } } -@mixin heading-font { +@mixin heading-font($size: 1.5rem) { @include heading-sans-serif; - font-size: 1.5rem; + font-size: $size; font-weight: 700; color: $blue-dk-300; } @@ -81,6 +102,14 @@ margin: 1rem 0 1.5rem 0; } +.card { + @extend .home-card; + margin-bottom: 0; + .heading { + @include heading-font(1.2rem); + } +} + .heading-main { @include heading-font; margin: 0; @@ -110,6 +139,53 @@ width: 100%; } +// List layout + +.numbered-list { + display: flex; + flex-direction: column; + gap: 2rem; + padding: 1rem; +} + +.list-item { + display: flex; + align-items: flex-start; + gap: 1rem; +} + +.number-circle { + width: 2.5rem; + height: 2.5rem; + border-radius: 50%; + background-color: $blue-lt-100; + color: $blue-dk-300; + display: flex; + align-items: center; + justify-content: center; + font-weight: bold; + font-size: 1.2rem; + flex-shrink: 0; +} + +.list-content { + max-width: 100%; +} + +.list-heading { + @include heading-font (1.2rem); + margin: 0 0 0.75rem 0; + font-size: 1.2rem; + color: $blue-dk-300; + font-weight: bold; +} + +.list-content p { + margin: 0.5rem 0; + font-size: 1rem; + line-height: 1.5; +} + // Banner style .os-banner { diff --git a/_sass/custom/custom.scss b/_sass/custom/custom.scss index b3ee3c3775..f70f8b6888 100755 --- a/_sass/custom/custom.scss +++ b/_sass/custom/custom.scss @@ -203,6 +203,13 @@ img { border-left: 5px solid $red-100; } +.info { + @extend %callout; + border-left: 5px solid $blue-300; + font-weight: 600; + background-color: $blue-lt-000; +} + @mixin version-warning ( $version: 'latest' ){ @extend %callout, .panel; font-weight: 600; @@ -307,6 +314,43 @@ img { } } +@mixin btn-dark-blue { + color: white; + background-color: $blue-300; + font-size: 1.13rem; + font-weight: 510; + border-width: 1px; + border-style: solid; + border-radius: 5px; + box-shadow: 1px 1px $grey-lt-300; + cursor: pointer; +} + +.btn-dark-blue { + @include btn-dark-blue; + border-color: $blue-dk-300; + padding: 0.5rem 1rem; + margin-left: 0.4rem; + margin-right: 0.4rem; + + &:hover:not([disabled]) { + background-color: $blue-vibrant-300; + box-shadow: 1px 2px 4px $grey-lt-300; + transform: translateY(-1px); + text-decoration: underline; + text-underline-offset: 2px; + } + + &:active { + transform: translateY(1px); + } +} + +.centering-container { + display: flex; + justify-content: center; +} + // Back to top button .top-link { display: block; diff --git a/_search-plugins/hybrid-search.md b/_search-plugins/hybrid-search.md deleted file mode 100644 index 355fbbd2c1..0000000000 --- a/_search-plugins/hybrid-search.md +++ /dev/null @@ -1,1416 +0,0 @@ ---- -layout: default -title: Hybrid search -has_children: false -nav_order: 60 ---- - -# Hybrid search -Introduced 2.11 -{: .label .label-purple } - -Hybrid search combines keyword and neural search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline intercepts search results at an intermediate stage and applies processing to normalize and combine document scores. - -There are two types of processors available for hybrid search: - -- [Normalization processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) (Introduced 2.10): A score-based processor that normalizes and combines document scores from multiple query clauses, rescoring the documents using the selected normalization and combination techniques. -- [Score ranker processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/score-ranker-processor/) (Introduced 2.19): A rank-based processor that uses rank fusion to combine and rerank documents from multiple query clauses. - -**PREREQUISITE**
-To follow this example, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). If you have already generated text embeddings, ingest the embeddings into an index and skip to [Step 4](#step-4-configure-a-search-pipeline). -{: .note} - -## Using hybrid search - -To use hybrid search, follow these steps: - -1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline). -1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). -1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). -1. [Configure a search pipeline](#step-4-configure-a-search-pipeline). -1. [Search the index using hybrid search](#step-5-search-the-index-using-hybrid-search). - -## Step 1: Create an ingest pipeline - -To generate vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`text_embedding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/text-embedding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings. - -The following example request creates an ingest pipeline that converts the text from `passage_text` to text embeddings and stores the embeddings in `passage_embedding`: - -```json -PUT /_ingest/pipeline/nlp-ingest-pipeline -{ - "description": "A text embedding pipeline", - "processors": [ - { - "text_embedding": { - "model_id": "bQ1J8ooBpBj3wT4HVUsb", - "field_map": { - "passage_text": "passage_embedding" - } - } - } - ] -} -``` -{% include copy-curl.html %} - -## Step 2: Create an index for ingestion - -In order to use the text embedding processor defined in your pipeline, create a k-NN index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`. - -The following example request creates a k-NN index that is set up with a default ingest pipeline: - -```json -PUT /my-nlp-index -{ - "settings": { - "index.knn": true, - "default_pipeline": "nlp-ingest-pipeline" - }, - "mappings": { - "properties": { - "id": { - "type": "text" - }, - "passage_embedding": { - "type": "knn_vector", - "dimension": 768, - "method": { - "engine": "lucene", - "space_type": "l2", - "name": "hnsw", - "parameters": {} - } - }, - "passage_text": { - "type": "text" - } - } - } -} -``` -{% include copy-curl.html %} - -For more information about creating a k-NN index and using supported methods, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). - -## Step 3: Ingest documents into the index - -To ingest documents into the index created in the previous step, send the following requests: - -```json -PUT /my-nlp-index/_doc/1 -{ - "passage_text": "Hello world", - "id": "s1" -} -``` -{% include copy-curl.html %} - -```json -PUT /my-nlp-index/_doc/2 -{ - "passage_text": "Hi planet", - "id": "s2" -} -``` -{% include copy-curl.html %} - -Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. - -## Step 4: Configure a search pipeline - -To configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/), use the following request. The normalization technique in the processor is set to `min_max`, and the combination technique is set to `arithmetic_mean`. The `weights` array specifies the weights assigned to each query clause as decimal percentages: - -```json -PUT /_search/pipeline/nlp-search-pipeline -{ - "description": "Post processor for hybrid search", - "phase_results_processors": [ - { - "normalization-processor": { - "normalization": { - "technique": "min_max" - }, - "combination": { - "technique": "arithmetic_mean", - "parameters": { - "weights": [ - 0.3, - 0.7 - ] - } - } - } - } - ] -} -``` -{% include copy-curl.html %} - -## Step 5: Search the index using hybrid search - -To perform hybrid search on your index, use the [`hybrid` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/), which combines the results of keyword and semantic search. - -#### Example: Combining a neural query and a match query - -The following example request combines two query clauses---a `neural` query and a `match` query. It specifies the search pipeline created in the previous step as a query parameter: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "_source": { - "exclude": [ - "passage_embedding" - ] - }, - "query": { - "hybrid": { - "queries": [ - { - "match": { - "passage_text": { - "query": "Hi world" - } - } - }, - { - "neural": { - "passage_embedding": { - "query_text": "Hi world", - "model_id": "aVeif4oB5Vm0Tdw8zYO2", - "k": 5 - } - } - } - ] - } - } -} -``` -{% include copy-curl.html %} - -Alternatively, you can set a default search pipeline for the `my-nlp-index` index. For more information, see [Default search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/#default-search-pipeline). - -The response contains the matching document: - -```json -{ - "took" : 36, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 1, - "relation" : "eq" - }, - "max_score" : 1.2251667, - "hits" : [ - { - "_index" : "my-nlp-index", - "_id" : "1", - "_score" : 1.2251667, - "_source" : { - "passage_text" : "Hello world", - "id" : "s1" - } - } - ] - } -} -``` -{% include copy-curl.html %} - -#### Example: Combining a match query and a term query - -The following example request combines two query clauses---a `match` query and a `term` query. It specifies the search pipeline created in the previous step as a query parameter: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "_source": { - "exclude": [ - "passage_embedding" - ] - }, - "query": { - "hybrid": { - "queries": [ - { - "match":{ - "passage_text": "hello" - } - }, - { - "term":{ - "passage_text":{ - "value":"planet" - } - } - } - ] - } - } -} -``` -{% include copy-curl.html %} - -The response contains the matching documents: - -```json -{ - "took": 11, - "timed_out": false, - "_shards": { - "total": 2, - "successful": 2, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 0.7, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "2", - "_score": 0.7, - "_source": { - "id": "s2", - "passage_text": "Hi planet" - } - }, - { - "_index": "my-nlp-index", - "_id": "1", - "_score": 0.3, - "_source": { - "id": "s1", - "passage_text": "Hello world" - } - } - ] - } -} -``` -{% include copy-curl.html %} - -## Hybrid search with post-filtering -**Introduced 2.13** -{: .label .label-purple } - -You can perform post-filtering on hybrid search results by providing the `post_filter` parameter in your query. - -The `post_filter` clause is applied after the search results have been retrieved. Post-filtering is useful for applying additional filters to the search results without impacting the scoring or the order of the results. - -Post-filtering does not impact document relevance scores or aggregation results. -{: .note} - -#### Example: Post-filtering - -The following example request combines two query clauses---a `term` query and a `match` query. This is the same query as in the [preceding example](#example-combining-a-match-query-and-a-term-query), but it contains a `post_filter`: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "query": { - "hybrid":{ - "queries":[ - { - "match":{ - "passage_text": "hello" - } - }, - { - "term":{ - "passage_text":{ - "value":"planet" - } - } - } - ] - } - - }, - "post_filter":{ - "match": { "passage_text": "world" } - } -} - -``` -{% include copy-curl.html %} - -Compare the results to the results without post-filtering in the [preceding example](#example-combining-a-match-query-and-a-term-query). Unlike the preceding example response, which contains two documents, the response in this example contains one document because the second document is filtered using post-filtering: - -```json -{ - "took": 18, - "timed_out": false, - "_shards": { - "total": 2, - "successful": 2, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 0.3, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "1", - "_score": 0.3, - "_source": { - "id": "s1", - "passage_text": "Hello world" - } - } - ] - } -} -``` - - -## Combining hybrid search and aggregations -**Introduced 2.13** -{: .label .label-purple } - -You can enhance search results by combining a hybrid query clause with any aggregation that OpenSearch supports. Aggregations allow you to use OpenSearch as an analytics engine. For more information about aggregations, see [Aggregations]({{site.url}}{{site.baseurl}}/aggregations/). - -Most aggregations are performed on the subset of documents that is returned by a hybrid query. The only aggregation that operates on all documents is the [`global`]({{site.url}}{{site.baseurl}}/aggregations/bucket/global/) aggregation. - -To use aggregations with a hybrid query, first create an index. Aggregations are typically used on fields of special types, like `keyword` or `integer`. The following example creates an index with several such fields: - -```json -PUT /my-nlp-index -{ - "settings": { - "number_of_shards": 2 - }, - "mappings": { - "properties": { - "doc_index": { - "type": "integer" - }, - "doc_keyword": { - "type": "keyword" - }, - "category": { - "type": "keyword" - } - } - } -} -``` -{% include copy-curl.html %} - -The following request ingests six documents into your new index: - -```json -POST /_bulk -{ "index": { "_index": "my-nlp-index" } } -{ "category": "permission", "doc_keyword": "workable", "doc_index": 4976, "doc_price": 100} -{ "index": { "_index": "my-nlp-index" } } -{ "category": "sister", "doc_keyword": "angry", "doc_index": 2231, "doc_price": 200 } -{ "index": { "_index": "my-nlp-index" } } -{ "category": "hair", "doc_keyword": "likeable", "doc_price": 25 } -{ "index": { "_index": "my-nlp-index" } } -{ "category": "editor", "doc_index": 9871, "doc_price": 30 } -{ "index": { "_index": "my-nlp-index" } } -{ "category": "statement", "doc_keyword": "entire", "doc_index": 8242, "doc_price": 350 } -{ "index": { "_index": "my-nlp-index" } } -{ "category": "statement", "doc_keyword": "idea", "doc_index": 5212, "doc_price": 200 } -{ "index": { "_index": "index-test" } } -{ "category": "editor", "doc_keyword": "bubble", "doc_index": 1298, "doc_price": 130 } -{ "index": { "_index": "index-test" } } -{ "category": "editor", "doc_keyword": "bubble", "doc_index": 521, "doc_price": 75 } -``` -{% include copy-curl.html %} - -Now you can combine a hybrid query clause with a `min` aggregation: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "query": { - "hybrid": { - "queries": [ - { - "term": { - "category": "permission" - } - }, - { - "bool": { - "should": [ - { - "term": { - "category": "editor" - } - }, - { - "term": { - "category": "statement" - } - } - ] - } - } - ] - } - }, - "aggs": { - "total_price": { - "sum": { - "field": "doc_price" - } - }, - "keywords": { - "terms": { - "field": "doc_keyword", - "size": 10 - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains the matching documents and the aggregation results: - -```json -{ - "took": 9, - "timed_out": false, - "_shards": { - "total": 2, - "successful": 2, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "mHRPNY4BlN82W_Ar9UMY", - "_score": 0.5, - "_source": { - "doc_price": 100, - "doc_index": 4976, - "doc_keyword": "workable", - "category": "permission" - } - }, - { - "_index": "my-nlp-index", - "_id": "m3RPNY4BlN82W_Ar9UMY", - "_score": 0.5, - "_source": { - "doc_price": 30, - "doc_index": 9871, - "category": "editor" - } - }, - { - "_index": "my-nlp-index", - "_id": "nXRPNY4BlN82W_Ar9UMY", - "_score": 0.5, - "_source": { - "doc_price": 200, - "doc_index": 5212, - "doc_keyword": "idea", - "category": "statement" - } - }, - { - "_index": "my-nlp-index", - "_id": "nHRPNY4BlN82W_Ar9UMY", - "_score": 0.5, - "_source": { - "doc_price": 350, - "doc_index": 8242, - "doc_keyword": "entire", - "category": "statement" - } - } - ] - }, - "aggregations": { - "total_price": { - "value": 680 - }, - "doc_keywords": { - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0, - "buckets": [ - { - "key": "entire", - "doc_count": 1 - }, - { - "key": "idea", - "doc_count": 1 - }, - { - "key": "workable", - "doc_count": 1 - } - ] - } - } -} -``` - -## Using sorting with a hybrid query -**Introduced 2.16** -{: .label .label-purple } - -By default, hybrid search returns results ordered by scores in descending order. You can apply sorting to hybrid query results by providing the `sort` criteria in the search request. For more information about sort criteria, see [Sort results]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/sort/). -When sorting is applied to a hybrid search, results are fetched from the shards based on the specified sort criteria. As a result, the search results are sorted accordingly, and the document scores are `null`. Scores are only present in the hybrid search sorting results if documents are sorted by `_score`. - -In the following example, sorting is applied by `doc_price` in the hybrid query search request: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "query": { - "hybrid": { - "queries": [ - { - "term": { - "category": "permission" - } - }, - { - "bool": { - "should": [ - { - "term": { - "category": "editor" - } - }, - { - "term": { - "category": "statement" - } - } - ] - } - } - ] - } - }, - "sort":[ - { - "doc_price": { - "order": "desc" - } - } - ] -} -``` -{% include copy-curl.html %} - -The response contains the matching documents sorted by `doc_price` in descending order: - -```json -{ - "took": 35, - "timed_out": false, - "_shards": { - "total": 3, - "successful": 3, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "7yaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "statement", - "doc_keyword": "entire", - "doc_index": 8242, - "doc_price": 350 - }, - "sort": [ - 350 - ] - }, - { - "_index": "my-nlp-index", - "_id": "8CaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "statement", - "doc_keyword": "idea", - "doc_index": 5212, - "doc_price": 200 - }, - "sort": [ - 200 - ] - }, - { - "_index": "my-nlp-index", - "_id": "6yaM4JABZkI1FQv8AwoM", - "_score": null, - "_source": { - "category": "permission", - "doc_keyword": "workable", - "doc_index": 4976, - "doc_price": 100 - }, - "sort": [ - 100 - ] - }, - { - "_index": "my-nlp-index", - "_id": "7iaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "editor", - "doc_index": 9871, - "doc_price": 30 - }, - "sort": [ - 30 - ] - } - ] - } -} -``` - -In the following example, sorting is applied by `_id`: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "query": { - "hybrid": { - "queries": [ - { - "term": { - "category": "permission" - } - }, - { - "bool": { - "should": [ - { - "term": { - "category": "editor" - } - }, - { - "term": { - "category": "statement" - } - } - ] - } - } - ] - } - }, - "sort":[ - { - "_id": { - "order": "desc" - } - } - ] -} -``` -{% include copy-curl.html %} - -The response contains the matching documents sorted by `_id` in descending order: - -```json -{ - "took": 33, - "timed_out": false, - "_shards": { - "total": 3, - "successful": 3, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "8CaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "statement", - "doc_keyword": "idea", - "doc_index": 5212, - "doc_price": 200 - }, - "sort": [ - "8CaM4JABZkI1FQv8AwoN" - ] - }, - { - "_index": "my-nlp-index", - "_id": "7yaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "statement", - "doc_keyword": "entire", - "doc_index": 8242, - "doc_price": 350 - }, - "sort": [ - "7yaM4JABZkI1FQv8AwoN" - ] - }, - { - "_index": "my-nlp-index", - "_id": "7iaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "editor", - "doc_index": 9871, - "doc_price": 30 - }, - "sort": [ - "7iaM4JABZkI1FQv8AwoN" - ] - }, - { - "_index": "my-nlp-index", - "_id": "6yaM4JABZkI1FQv8AwoM", - "_score": null, - "_source": { - "category": "permission", - "doc_keyword": "workable", - "doc_index": 4976, - "doc_price": 100 - }, - "sort": [ - "6yaM4JABZkI1FQv8AwoM" - ] - } - ] - } -} -``` - -## Hybrid search with search_after -**Introduced 2.16** -{: .label .label-purple } - -You can control sorting results by applying a `search_after` condition that provides a live cursor and uses the previous page's results to obtain the next page's results. For more information about `search_after`, see [The search_after parameter]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-search_after-parameter). - -You can paginate the sorted results by applying a `search_after` condition in the sort queries. - -In the following example, sorting is applied by `doc_price` with a `search_after` condition: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "query": { - "hybrid": { - "queries": [ - { - "term": { - "category": "permission" - } - }, - { - "bool": { - "should": [ - { - "term": { - "category": "editor" - } - }, - { - "term": { - "category": "statement" - } - } - ] - } - } - ] - } - }, - "sort":[ - { - "_id": { - "order": "desc" - } - } - ], - "search_after":[200] -} -``` -{% include copy-curl.html %} - -The response contains the matching documents that are listed after the `200` sort value, sorted by `doc_price` in descending order: - -```json -{ - "took": 8, - "timed_out": false, - "_shards": { - "total": 3, - "successful": 3, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "6yaM4JABZkI1FQv8AwoM", - "_score": null, - "_source": { - "category": "permission", - "doc_keyword": "workable", - "doc_index": 4976, - "doc_price": 100 - }, - "sort": [ - 100 - ] - }, - { - "_index": "my-nlp-index", - "_id": "7iaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "editor", - "doc_index": 9871, - "doc_price": 30 - }, - "sort": [ - 30 - ] - } - ] - } -} -``` - -In the following example, sorting is applied by `id` with a `search_after` condition: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "query": { - "hybrid": { - "queries": [ - { - "term": { - "category": "permission" - } - }, - { - "bool": { - "should": [ - { - "term": { - "category": "editor" - } - }, - { - "term": { - "category": "statement" - } - } - ] - } - } - ] - } - }, - "sort":[ - { - "_id": { - "order": "desc" - } - } - ], - "search_after":["7yaM4JABZkI1FQv8AwoN"] -} -``` -{% include copy-curl.html %} - -The response contains the matching documents that are listed after the `7yaM4JABZkI1FQv8AwoN` sort value, sorted by `id` in descending order: - -```json -{ - "took": 17, - "timed_out": false, - "_shards": { - "total": 3, - "successful": 3, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "7iaM4JABZkI1FQv8AwoN", - "_score": null, - "_source": { - "category": "editor", - "doc_index": 9871, - "doc_price": 30 - }, - "sort": [ - "7iaM4JABZkI1FQv8AwoN" - ] - }, - { - "_index": "my-nlp-index", - "_id": "6yaM4JABZkI1FQv8AwoM", - "_score": null, - "_source": { - "category": "permission", - "doc_keyword": "workable", - "doc_index": 4976, - "doc_price": 100 - }, - "sort": [ - "6yaM4JABZkI1FQv8AwoM" - ] - } - ] - } -} -``` - -## Explain -**Introduced 2.19** -{: .label .label-purple } - -You can provide the `explain` parameter to understand how scores are calculated, normalized, and combined in hybrid queries. When enabled, it provides detailed information about the scoring process for each search result. This includes revealing the score normalization techniques used, how different scores were combined, and the calculations for individual subquery scores. This comprehensive insight makes it easier to understand and optimize your hybrid query results. For more information about `explain`, see [Explain API]({{site.url}}{{site.baseurl}}/api-reference/explain/). - -`explain` is an expensive operation in terms of both resources and time. For production clusters, we recommend using it sparingly for the purpose of troubleshooting. -{: .warning } - -You can provide the `explain` parameter in a URL when running a complete hybrid query using the following syntax: - -```json -GET /_search?search_pipeline=&explain=true -POST /_search?search_pipeline=&explain=true -``` - -To use the `explain` parameter, you must configure the `hybrid_score_explanation` response processor in your search pipeline. For more information, see [Hybrid score explanation processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/explanation-processor/). - -You can also use `explain` with the individual document ID: - -```json -GET /_explain/ -POST /_explain/ -``` - -In this case, the result will contain only low-level scoring information, for example, [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) scores for text-based queries such as `term` or `match`. For an example response, see [Explain API example response]({{site.url}}{{site.baseurl}}/api-reference/explain/#example-response). - -To see the `explain` output for all results, set the parameter to `true` either in the URL or in the request body: - -```json -POST my-nlp-index/_search?search_pipeline=my_pipeline&explain=true -{ - "_source": { - "exclude": [ - "passage_embedding" - ] - }, - "query": { - "hybrid": { - "queries": [ - { - "match": { - "text": { - "query": "horse" - } - } - }, - { - "neural": { - "passage_embedding": { - "query_text": "wild west", - "model_id": "aVeif4oB5Vm0Tdw8zYO2", - "k": 5 - } - } - } - ] - } - } -} -``` -{% include copy-curl.html %} - -The response contains scoring information: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 54, - "timed_out": false, - "_shards": { - "total": 2, - "successful": 2, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 5, - "relation": "eq" - }, - "max_score": 0.9251075, - "hits": [ - { - "_shard": "[my-nlp-index][0]", - "_node": "IsuzeVYdSqKUfy0qfqil2w", - "_index": "my-nlp-index", - "_id": "5", - "_score": 0.9251075, - "_source": { - "text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse .", - "id": "2691147709.jpg" - }, - "_explanation": { - "value": 0.9251075, - "description": "arithmetic_mean combination of:", - "details": [ - { - "value": 1.0, - "description": "min_max normalization of:", - "details": [ - { - "value": 1.2336599, - "description": "weight(text:horse in 0) [PerFieldSimilarity], result of:", - "details": [ - { - "value": 1.2336599, - "description": "score(freq=1.0), computed as boost * idf * tf from:", - "details": [ - { - "value": 2.2, - "description": "boost", - "details": [] - }, - { - "value": 1.2039728, - "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", - "details": [ - { - "value": 1, - "description": "n, number of documents containing term", - "details": [] - }, - { - "value": 4, - "description": "N, total number of documents with field", - "details": [] - } - ] - }, - { - "value": 0.46575344, - "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:", - "details": [ - { - "value": 1.0, - "description": "freq, occurrences of term within document", - "details": [] - }, - { - "value": 1.2, - "description": "k1, term saturation parameter", - "details": [] - }, - { - "value": 0.75, - "description": "b, length normalization parameter", - "details": [] - }, - { - "value": 16.0, - "description": "dl, length of field", - "details": [] - }, - { - "value": 17.0, - "description": "avgdl, average length of field", - "details": [] - } - ] - } - ] - } - ] - } - ] - }, - { - "value": 0.8503647, - "description": "min_max normalization of:", - "details": [ - { - "value": 0.015177966, - "description": "within top 5", - "details": [] - } - ] - } - ] -... -``` -
- -### Response body fields - -Field | Description -:--- | :--- -`explanation` | The `explanation` object has three properties: `value`, `description`, and `details`. The `value` property shows the result of the calculation, `description` explains what type of calculation was performed, and `details` shows any subcalculations performed. For score normalization, the information in the `description` property includes the technique used for normalization or combination and the corresponding score. - -## Paginating hybrid query results -**Introduced 2.19** -{: .label .label-purple } - -You can apply pagination to hybrid query results by using the `pagination_depth` parameter in the hybrid query clause, along with the standard `from` and `size` parameters. The `pagination_depth` parameter defines the maximum number of search results that can be retrieved from each shard per subquery. For example, setting `pagination_depth` to `50` allows up to 50 results per subquery to be maintained in memory from each shard. - -To navigate through the results, use the `from` and `size` parameters: - -- `from`: Specifies the document number from which you want to start showing the results. Default is `0`. -- `size`: Specifies the number of results to return on each page. Default is `10`. - -For example, to show 10 documents starting from the 20th document, specify `from: 20` and `size: 10`. For more information about pagination, see [Paginate results]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-from-and-size-parameters). - -### The impact of pagination_depth on hybrid search results - -Changing `pagination_depth` affects the underlying set of search results retrieved before any ranking, filtering, or pagination adjustments are applied. This is because `pagination_depth` determines the number of results retrieved per subquery from each shard, which can ultimately change the result order after normalization. To ensure consistent pagination, keep the `pagination_depth` value the same while navigating between pages. - -By default, hybrid search without pagination retrieves results using the `from + size` formula, where `from` is always `0`. -{: .note} - -To enable deeper pagination, increase the `pagination_depth` value. You can then navigate through results using the `from` and `size` parameters. Note that deeper pagination can impact search performance because retrieving and processing more results requires additional computational resources. - -The following example shows a search request configured with `from: 0`, `size: 5`, and `pagination_depth: 10`. This means that up to 10 search results per shard will be retrieved for both the `bool` and `term` queries before pagination is applied: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "size": 5, - "query": { - "hybrid": { - "pagination_depth":10, - "queries": [ - { - "term": { - "category": "permission" - } - }, - { - "bool": { - "should": [ - { - "term": { - "category": "editor" - } - }, - { - "term": { - "category": "statement" - } - } - ] - } - } - ] - } - } -} -``` -{% include copy-curl.html %} - -The response contains the first five results: - -```json -{ - "hits": { - "total": { - "value": 6, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "my-nlp-index", - "_id": "d3eXlZQBJkWerFzHv4eV", - "_score": 0.5, - "_source": { - "category": "permission", - "doc_keyword": "workable", - "doc_index": 4976, - "doc_price": 100 - } - }, - { - "_index": "my-nlp-index", - "_id": "eneXlZQBJkWerFzHv4eW", - "_score": 0.5, - "_source": { - "category": "editor", - "doc_index": 9871, - "doc_price": 30 - } - }, - { - "_index": "my-nlp-index", - "_id": "e3eXlZQBJkWerFzHv4eW", - "_score": 0.5, - "_source": { - "category": "statement", - "doc_keyword": "entire", - "doc_index": 8242, - "doc_price": 350 - } - }, - { - "_index": "my-nlp-index", - "_id": "fHeXlZQBJkWerFzHv4eW", - "_score": 0.24999997, - "_source": { - "category": "statement", - "doc_keyword": "idea", - "doc_index": 5212, - "doc_price": 200 - } - }, - { - "_index": "index-test", - "_id": "fXeXlZQBJkWerFzHv4eW", - "_score": 5.0E-4, - "_source": { - "category": "editor", - "doc_keyword": "bubble", - "doc_index": 1298, - "doc_price": 130 - } - } - ] - } -} -``` - -The following search request is configured with `from: 6`, `size: 5`, and `pagination_depth: 10`. The `pagination_depth` remains unchanged to ensure that pagination is based on the same set of search results: - -```json -GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline -{ - "size":5, - "from":6, - "query": { - "hybrid": { - "pagination_depth":10, - "queries": [ - { - "term": { - "category": "permission" - } - }, - { - "bool": { - "should": [ - { - "term": { - "category": "editor" - } - }, - { - "term": { - "category": "statement" - } - } - ] - } - } - ] - } - } -} -``` -{% include copy-curl.html %} - -The response excludes the first five entries and displays the remaining results: - -```json -{ - "hits": { - "total": { - "value": 6, - "relation": "eq" - }, - "max_score": 0.5, - "hits": [ - { - "_index": "index-test", - "_id": "fneXlZQBJkWerFzHv4eW", - "_score": 5.0E-4, - "_source": { - "category": "editor", - "doc_keyword": "bubble", - "doc_index": 521, - "doc_price": 75 - } - } - ] - } -} -``` diff --git a/_search-plugins/index.md b/_search-plugins/index.md index cca2493b8a..f9aff32389 100644 --- a/_search-plugins/index.md +++ b/_search-plugins/index.md @@ -22,25 +22,9 @@ OpenSearch supports the following search methods. OpenSearch supports [keyword (BM25) search]({{site.url}}{{site.baseurl}}/search-plugins/keyword-search/), which searches the document corpus for words that appear in the query. -### ML-powered search +### Vector search -OpenSearch supports the following machine learning (ML)-powered search methods: - -- **Vector search** - - - [k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/): Searches for the k-nearest neighbors to a search term across an index of vectors. - -- **Neural search**: [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) facilitates generating vector embeddings at ingestion time and searching them at search time. Neural search lets you integrate ML models into your search and serves as a framework for implementing other search methods. The following search methods are built on top of neural search: - - - [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/): Considers the meaning of the words in the search context. Uses dense retrieval based on text embedding models to search text data. - - - [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/): Uses multimodal embedding models to search text and image data. - - - [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/): Uses sparse retrieval based on sparse embedding models to search text data. - - - [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): Combines traditional search and vector search to improve search relevance. - - - [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/): Implements a retrieval-augmented generative search. +OpenSearch supports various machine learning (ML)-powered search methods using [vector search]({{site.url}}{{site.baseurl}}/vector-search/). ## Query languages diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md deleted file mode 100644 index 8c10da9793..0000000000 --- a/_search-plugins/knn/approximate-knn.md +++ /dev/null @@ -1,416 +0,0 @@ ---- -layout: default -title: Approximate k-NN search -nav_order: 15 -parent: k-NN search -has_children: false -has_math: true ---- - -# Approximate k-NN search - -Standard k-NN search methods compute similarity using a brute-force approach that measures the nearest distance between a query and a number of points, which produces exact results. This works well in many applications. However, in the case of extremely large datasets with high dimensionality, this creates a scaling problem that reduces the efficiency of the search. Approximate k-NN search methods can overcome this by employing tools that restructure indexes more efficiently and reduce the dimensionality of searchable vectors. Using this approach requires a sacrifice in accuracy but increases search processing speeds appreciably. - -The Approximate k-NN search methods leveraged by OpenSearch use approximate nearest neighbor (ANN) algorithms from the [nmslib](https://github.com/nmslib/nmslib), [faiss](https://github.com/facebookresearch/faiss), and [Lucene](https://lucene.apache.org/) libraries to power k-NN search. These search methods employ ANN to improve search latency for large datasets. Of the three search methods the k-NN plugin provides, this method offers the best search scalability for large datasets. This approach is the preferred method when a dataset reaches hundreds of thousands of vectors. - -For details on the algorithms the plugin currently supports, see [k-NN Index documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions). -{: .note} - -The k-NN plugin builds a native library index of the vectors for each knn-vector field/Lucene segment pair during indexing, which can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These native library indexes are loaded into native memory during search and managed by a cache. To learn more about preloading native library indexes into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see which native library indexes are already loaded in memory. To learn more about this, see the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats). - -Because the native library indexes are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search. - -## Recommendations for engines and cluster node sizing - -Each of the three engines used for approximate k-NN search has its own attributes that make one more sensible to use than the others in a given situation. You can follow the general information below to help determine which engine will best meet your requirements. - -In general, NMSLIB (deprecated) outperforms both Faiss and Lucene when used for search operations. However, to optimize for indexing throughput, Faiss is a good option. For relatively smaller datasets (up to a few million vectors), the Lucene engine demonstrates better latencies and recall. At the same time, the size of the index is smallest compared to the other engines, which allows it to use smaller AWS instances for data nodes. - -When considering cluster node sizing, a general approach is to first establish an even distribution of the index across the cluster. However, there are other considerations. To help make these choices, you can refer to the OpenSearch managed service guidance in the section [Sizing domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html). - -## Get started with approximate k-NN - -To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with `index.knn` set to `true`. This setting tells the plugin to create native library indexes for the index. - -Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two -`knn_vector` fields, one using `faiss` and the other using `nmslib` (deprecated) fields: - -```json -PUT my-knn-index-1 -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100 - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 2, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "faiss", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } - }, - "my_vector2": { - "type": "knn_vector", - "dimension": 4, - "space_type": "innerproduct", - "method": { - "name": "hnsw", - "engine": "faiss", - "parameters": { - "ef_construction": 256, - "m": 48 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -In the preceding example, both `knn_vector` fields are configured using method definitions. Additionally, `knn_vector` fields can be configured using models. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). - -The `knn_vector` data type supports a vector of floats that can have a dimension count of up to 16,000 for the NMSLIB, Faiss, and Lucene engines, as set by the dimension mapping parameter. - -In OpenSearch, codecs handle the storage and retrieval of indexes. The k-NN plugin uses a custom codec to write vector data to native library indexes so that the underlying k-NN search library can read it. -{: .tip } - -After you create the index, you can add some data to it: - -```json -POST _bulk -{ "index": { "_index": "my-knn-index-1", "_id": "1" } } -{ "my_vector1": [1.5, 2.5], "price": 12.2 } -{ "index": { "_index": "my-knn-index-1", "_id": "2" } } -{ "my_vector1": [2.5, 3.5], "price": 7.1 } -{ "index": { "_index": "my-knn-index-1", "_id": "3" } } -{ "my_vector1": [3.5, 4.5], "price": 12.9 } -{ "index": { "_index": "my-knn-index-1", "_id": "4" } } -{ "my_vector1": [5.5, 6.5], "price": 1.2 } -{ "index": { "_index": "my-knn-index-1", "_id": "5" } } -{ "my_vector1": [4.5, 5.5], "price": 3.7 } -{ "index": { "_index": "my-knn-index-1", "_id": "6" } } -{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } -{ "index": { "_index": "my-knn-index-1", "_id": "7" } } -{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } -{ "index": { "_index": "my-knn-index-1", "_id": "8" } } -{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } -{ "index": { "_index": "my-knn-index-1", "_id": "9" } } -{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } -``` -{% include copy-curl.html %} - -Then you can execute an approximate nearest neighbor search on the data using the `knn` query type: - -```json -GET my-knn-index-1/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector2": { - "vector": [2, 3, 5, 6], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} - -### The number of returned results - -In the preceding query, `k` represents the number of neighbors returned by the search of each graph. You must also include the `size` option, indicating the final number of results that you want the query to return. - -For the NMSLIB and Faiss engines, `k` represents the maximum number of documents returned for all segments of a shard. For the Lucene engine, `k` represents the number of documents returned for a shard. The maximum value of `k` is 10,000. - -For any engine, each shard returns `size` results to the coordinator node. Thus, the total number of results that the coordinator node receives is `size * number of shards`. After the coordinator node consolidates the results received from all nodes, the query returns the top `size` results. - -The following table provides examples of the number of results returned by various engines in several scenarios. For these examples, assume that the number of documents contained in the segments and shards is sufficient to return the number of results specified in the table. - -`size` | `k` | Number of primary shards | Number of segments per shard | Number of returned results, Faiss/NMSLIB | Number of returned results, Lucene -:--- | :--- | :--- | :--- | :--- | :--- -10 | 1 | 1 | 4 | 4 | 1 -10 | 10 | 1 | 4 | 10 | 10 -10 | 1 | 2 | 4 | 8 | 2 - -The number of results returned by Faiss/NMSLIB differs from the number of results returned by Lucene only when `k` is smaller than `size`. If `k` and `size` are equal, all engines return the same number of results. - -Starting in OpenSearch 2.14, you can use `k`, `min_score`, or `max_distance` for [radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). - -### Building a k-NN index from a model - -For some of the algorithms that the k-NN plugin supports, the native library index needs to be trained before it can be used. It would be expensive to train every newly created segment, so, instead, the plugin features the concept of a *model* that initializes the native library index during segment creation. You can create a model by calling the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model) and passing in the source of the training data and the method definition of the model. Once training is complete, the model is serialized to a k-NN model system index. Then, during indexing, the model is pulled from this index to initialize the segments. - -To train a model, you first need an OpenSearch index containing training data. Training data can come from any `knn_vector` field that has a dimension matching the dimension of the model you want to create. Training data can be the same data that you are going to index or data in a separate set. To create a training index, send the following request: - -```json -PUT /train-index -{ - "settings": { - "number_of_shards": 3, - "number_of_replicas": 0 - }, - "mappings": { - "properties": { - "train-field": { - "type": "knn_vector", - "dimension": 4 - } - } - } -} -``` -{% include copy-curl.html %} - -Notice that `index.knn` is not set in the index settings. This ensures that you do not create native library indexes for this index. - -You can now add some data to the index: - -```json -POST _bulk -{ "index": { "_index": "train-index", "_id": "1" } } -{ "train-field": [1.5, 5.5, 4.5, 6.4]} -{ "index": { "_index": "train-index", "_id": "2" } } -{ "train-field": [2.5, 3.5, 5.6, 6.7]} -{ "index": { "_index": "train-index", "_id": "3" } } -{ "train-field": [4.5, 5.5, 6.7, 3.7]} -{ "index": { "_index": "train-index", "_id": "4" } } -{ "train-field": [1.5, 5.5, 4.5, 6.4]} -``` -{% include copy-curl.html %} - -After indexing into the training index completes, you can call the Train API: - -```json -POST /_plugins/_knn/models/my-model/_train -{ - "training_index": "train-index", - "training_field": "train-field", - "dimension": 4, - "description": "My model description", - "space_type": "l2", - "method": { - "name": "ivf", - "engine": "faiss", - "parameters": { - "nlist": 4, - "nprobes": 2 - } - } -} -``` -{% include copy-curl.html %} - -The Train API returns as soon as the training job is started. To check the job status, use the Get Model API: - -```json -GET /_plugins/_knn/models/my-model?filter_path=state&pretty -{ - "state": "training" -} -``` -{% include copy-curl.html %} - -Once the model enters the `created` state, you can create an index that will use this model to initialize its native library indexes: - -```json -PUT /target-index -{ - "settings": { - "number_of_shards": 3, - "number_of_replicas": 1, - "index.knn": true - }, - "mappings": { - "properties": { - "target-field": { - "type": "knn_vector", - "model_id": "my-model" - } - } - } -} -``` -{% include copy-curl.html %} - -Lastly, you can add the documents you want to be searched to the index: - -```json -POST _bulk -{ "index": { "_index": "target-index", "_id": "1" } } -{ "target-field": [1.5, 5.5, 4.5, 6.4]} -{ "index": { "_index": "target-index", "_id": "2" } } -{ "target-field": [2.5, 3.5, 5.6, 6.7]} -{ "index": { "_index": "target-index", "_id": "3" } } -{ "target-field": [4.5, 5.5, 6.7, 3.7]} -{ "index": { "_index": "target-index", "_id": "4" } } -{ "target-field": [1.5, 5.5, 4.5, 6.4]} -``` -{% include copy-curl.html %} - -After data is ingested, it can be searched in the same way as any other `knn_vector` field. - -### Additional query parameters - -Starting with version 2.16, you can provide `method_parameters` in a search request: - -```json -GET my-knn-index-1/_search -{ - "size": 2, - "query": { - "knn": { - "target-field": { - "vector": [2, 3, 5, 6], - "k": 2, - "method_parameters" : { - "ef_search": 100 - } - } - } - } -} -``` -{% include copy-curl.html %} - -These parameters are dependent on the combination of engine and method used to create the index. The following sections provide information about the supported `method_parameters`. - -#### `ef_search` - -You can provide the `ef_search` parameter when searching an index created using the `hnsw` method. The `ef_search` parameter specifies the number of vectors to examine in order to find the top k nearest neighbors. Higher `ef_search` values improve recall at the cost of increased search latency. The value must be positive. - -The following table provides information about the `ef_search` parameter for the supported engines. - -Engine | Radial query support | Notes -:--- | :--- | :--- -`nmslib` (Deprecated) | No | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. -`faiss` | Yes | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. -`lucene` | No | When creating a search query, you must specify `k`. If you provide both `k` and `ef_search`, then the larger value is passed to the engine. If `ef_search` is larger than `k`, you can provide the `size` parameter to limit the final number of results to `k`. - -#### `nprobes` - -You can provide the `nprobes` parameter when searching an index created using the `ivf` method. The `nprobes` parameter specifies the number of buckets to examine in order to find the top k nearest neighbors. Higher `nprobes` values improve recall at the cost of increased search latency. The value must be positive. - -The following table provides information about the `nprobes` parameter for the supported engines. - -Engine | Notes -:--- | :--- -`faiss` | If `nprobes` is present in a query, it overrides the value provided when creating the index. - -### Rescoring quantized results using full precision - -Quantization can be used to significantly reduce the memory footprint of a k-NN index. For more information about quantization, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization). Because some vector representation is lost during quantization, the computed distances will be approximate. This causes the overall recall of the search to decrease. - -To improve recall while maintaining the memory savings of quantization, you can use a two-phase search approach. In the first phase, `oversample_factor * k` results are retrieved from an index using quantized vectors and the scores are approximated. In the second phase, the full-precision vectors of those `oversample_factor * k` results are loaded into memory from disk, and scores are recomputed against the full-precision query vector. The results are then reduced to the top k. - -The default rescoring behavior is determined by the `mode` and `compression_level` of the backing k-NN vector field: - -- For `in_memory` mode, no rescoring is applied by default. -- For `on_disk` mode, default rescoring is based on the configured `compression_level`. Each `compression_level` provides a default `oversample_factor`, specified in the following table. - -| Compression level | Default rescore `oversample_factor` | -|:------------------|:----------------------------------| -| `32x` (default) | 3.0 | -| `16x` | 2.0 | -| `8x` | 2.0 | -| `4x` | No default rescoring | -| `2x` | No default rescoring | - -To explicitly apply rescoring, provide the `rescore` parameter in a query on a quantized index and specify the `oversample_factor`: - -```json -GET my-knn-index-1/_search -{ - "size": 2, - "query": { - "knn": { - "target-field": { - "vector": [2, 3, 5, 6], - "k": 2, - "rescore" : { - "oversample_factor": 1.2 - } - } - } - } -} -``` -{% include copy-curl.html %} - -Alternatively, set the `rescore` parameter to `true` to use a default `oversample_factor` of `1.0`: - -```json -GET my-knn-index-1/_search -{ - "size": 2, - "query": { - "knn": { - "target-field": { - "vector": [2, 3, 5, 6], - "k": 2, - "rescore" : true - } - } - } -} -``` -{% include copy-curl.html %} - -The `oversample_factor` is a floating-point number between 1.0 and 100.0, inclusive. The number of results in the first pass is calculated as `oversample_factor * k` and is guaranteed to be between 100 and 10,000, inclusive. If the calculated number of results is smaller than 100, then the number of results is set to 100. If the calculated number of results is greater than 10,000, then the number of results is set to 10,000. - -Rescoring is only supported for the `faiss` engine. - -Rescoring is not needed if quantization is not used because the scores returned are already fully precise. -{: .note} - -### Using approximate k-NN with filters - -To learn about using filters with k-NN search, see [k-NN search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). - -### Using approximate k-NN with nested fields - -To learn about using k-NN search with nested fields, see [k-NN search with nested fields]({{site.url}}{{site.baseurl}}/search-plugins/knn/nested-search-knn/). - -### Using approximate radial search - -To learn more about the radial search feature, see [k-NN radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). - -### Using approximate k-NN with binary vectors - -To learn more about using binary vectors with k-NN search, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). - -## Spaces - -A _space_ corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a higher score equates to a better result. The k-NN plugin supports the following spaces. - -Not every method supports each of these spaces. Be sure to check out [the method documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) to make sure the space you are interested in is supported. -{: note.} - -| Space type | Distance function ($$d$$ ) | OpenSearch score | -| :--- | :--- | :--- | -| `l1` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \lvert x_i - y_i \rvert $$ | $$ score = {1 \over {1 + d} } $$ | -| `l2` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 $$ | $$ score = {1 \over 1 + d } $$ | -| `linf` | $$ d(\mathbf{x}, \mathbf{y}) = max(\lvert x_i - y_i \rvert) $$ | $$ score = {1 \over 1 + d } $$ | -| `cosinesimil` | $$ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} \cdot \mathbf{y} \over \lVert \mathbf{x}\rVert \cdot \lVert \mathbf{y}\rVert}$$$$ = 1 - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} \cdot \sqrt{\sum_{i=1}^n y_i^2}}$$,
where $$\lVert \mathbf{x}\rVert$$ and $$\lVert \mathbf{y}\rVert$$ represent the norms of vectors $$\mathbf{x}$$ and $$\mathbf{y}$$, respectively. | $$ score = {2 - d \over 2} $$ | -| `innerproduct` (supported for Lucene in OpenSearch version 2.13 and later) | **NMSLIB** and **Faiss**:
$$ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i $$

**Lucene**:
$$ d(\mathbf{x}, \mathbf{y}) = {\mathbf{x} \cdot \mathbf{y}} = \sum_{i=1}^n x_i y_i $$ | **NMSLIB** and **Faiss**:
$$ \text{If} d \ge 0, score = {1 \over 1 + d }$$
$$\text{If} d < 0, score = −d + 1$$

**Lucene:**
$$ \text{If} d > 0, score = d + 1 $$
$$\text{If} d \le 0, score = {1 \over 1 + (-1 \cdot d) }$$ | -| `hamming` (supported for binary vectors in OpenSearch version 2.16 and later) | $$ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})$$ | $$ score = {1 \over 1 + d } $$ | - -The cosine similarity formula does not include the `1 -` prefix. However, because similarity search libraries equate lower scores with closer results, they return `1 - cosineSimilarity` for the cosine similarity space---this is why `1 -` is included in the distance function. -{: .note } - -With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. -{: .note } - -The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). -{: .note} diff --git a/_search-plugins/knn/index.md b/_search-plugins/knn/index.md deleted file mode 100644 index f8c28bcc4e..0000000000 --- a/_search-plugins/knn/index.md +++ /dev/null @@ -1,44 +0,0 @@ ---- -layout: default -title: k-NN search -nav_order: 20 -has_children: true -has_toc: false -redirect_from: - - /search-plugins/knn/ ---- - -# k-NN search - -Short for *k-nearest neighbors*, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. To determine the neighbors, you can specify the space (the distance function) you want to use to measure the distance between points. - -Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For more background information about k-NN search, see [Wikipedia](https://en.wikipedia.org/wiki/Nearest_neighbor_search). - -This plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors: - -1. **Approximate k-NN** - - The first method takes an approximate nearest neighbor approach---it uses one of several algorithms to return the approximate k-nearest neighbors to a query vector. Usually, these algorithms sacrifice indexing speed and search accuracy in return for performance benefits such as lower latency, smaller memory footprints and more scalable search. To learn more about the algorithms, refer to [*nmslib*](https://github.com/nmslib/nmslib/blob/master/manual/README.md)'s and [*faiss*](https://github.com/facebookresearch/faiss/wiki)'s documentation. - - Approximate k-NN is the best choice for searches over large indexes (that is, hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the script scoring method or Painless extensions. - - For more details about this method, including recommendations for which engine to use, see [Approximate k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/). - -2. **Script Score k-NN** - - The second method extends OpenSearch's script scoring functionality to execute a brute force, exact k-NN search over "knn_vector" fields or fields that can represent binary objects. With this approach, you can run k-NN search on a subset of vectors in your index (sometimes referred to as a pre-filter search). - - Use this approach for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indexes may lead to high latencies. - - For more details about this method, see [Exact k-NN with scoring script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/). - -3. **Painless extensions** - - The third method adds the distance functions as painless extensions that you can use in more complex combinations. Similar to the k-NN Script Score, you can use this method to perform a brute force, exact k-NN search across an index, which also supports pre-filtering. - - This approach has slightly slower query performance compared to the k-NN Script Score. If your use case requires more customization over the final score, you should use this approach over Script Score k-NN. - - For more details about this method, see [Painless scripting functions]({{site.url}}{{site.baseurl}}/search-plugins/knn/painless-functions/). - - -Overall, for larger data sets, you should generally choose the approximate nearest neighbor method because it scales significantly better. For smaller data sets, where you may want to apply a filter, you should choose the custom scoring approach. If you have a more complex use case where you need to use a distance function as part of their scoring method, you should use the painless scripting approach. diff --git a/_search-plugins/knn/jni-libraries.md b/_search-plugins/knn/jni-libraries.md deleted file mode 100644 index 4dbdb2da56..0000000000 --- a/_search-plugins/knn/jni-libraries.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -layout: default -title: JNI libraries -nav_order: 35 -parent: k-NN search -has_children: false -redirect_from: - - /search-plugins/knn/jni-library/ ---- - -# JNI libraries - -To integrate [nmslib](https://github.com/nmslib/nmslib/) and [faiss](https://github.com/facebookresearch/faiss/) approximate k-NN functionality (implemented in C++) into the k-NN plugin (implemented in Java), we created a Java Native Interface, which lets the k-NN plugin make calls to the native libraries. The interface includes three libraries: `libopensearchknn_nmslib`, the JNI library that interfaces with nmslib, `libopensearchknn_faiss`, the JNI library that interfaces with faiss, and `libopensearchknn_common`, a library containing common shared functionality between native libraries. - -The Lucene library is not implemented using a native library. -{: .note} - -The libraries `libopensearchknn_faiss` and `libopensearchknn_nmslib` are lazily loaded when they are first called in the plugin. This means that if you are only planning on using one of the libraries, the plugin never loads the other library. - -To build the libraries from source, refer to the [DEVELOPER_GUIDE](https://github.com/opensearch-project/k-NN/blob/main/DEVELOPER_GUIDE.md). - -For more information about JNI, see [Java Native Interface](https://en.wikipedia.org/wiki/Java_Native_Interface) on Wikipedia. diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md deleted file mode 100644 index b5cfa03470..0000000000 --- a/_search-plugins/knn/knn-index.md +++ /dev/null @@ -1,385 +0,0 @@ ---- -layout: default -title: k-NN index -nav_order: 5 -parent: k-NN search -has_children: false ---- - -# k-NN index - -The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). - -To create a k-NN index, set the `settings.index.knn` parameter to `true`: - -```json -PUT /test-index -{ - "settings": { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 3, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "lucene", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -## Byte vectors - -Starting with k-NN plugin version 2.17, you can use `byte` vectors with the `faiss` and `lucene` engines to reduce the amount of required memory and storage space. For more information, see [Byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vectors). - -## Binary vectors - -Starting with k-NN plugin version 2.16, you can use `binary` vectors with the `faiss` engine to reduce the amount of required storage space. For more information, see [Binary vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). - -Starting with k-NN plugin version 2.19, you can use `binary` vectors with the `lucene` engine. - -## SIMD optimization for the Faiss engine - -Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost overall performance by improving indexing throughput and reducing search latency. Starting with version 2.18, the k-NN plugin supports AVX-512 SIMD instructions on x64 architecture. Starting with version 2.19, the k-NN plugin supports advanced AVX-512 SIMD instructions on x64 architecture for Intel Sapphire Rapids or a newer-generation processor, improving the performance of Hamming distance computation. - -SIMD optimization is applicable only if the vector dimension is a multiple of 8. -{: .note} - - -### x64 architecture - - -For x64 architecture, the following versions of the Faiss library are built and shipped with the artifact: - -- `libopensearchknn_faiss_avx512_spr.so`: The Faiss library containing advanced AVX-512 SIMD instructions for newer-generation processors, available on public clouds such as AWS for c/m/r 7i or newer instances. -- `libopensearchknn_faiss_avx512.so`: The Faiss library containing AVX-512 SIMD instructions. -- `libopensearchknn_faiss_avx2.so`: The Faiss library containing AVX2 SIMD instructions. -- `libopensearchknn_faiss.so`: The non-optimized Faiss library without SIMD instructions. - -When using the Faiss library, the performance ranking is as follows: advanced AVX-512 > AVX-512 > AVX2 > no optimization. -{: .note } - -If your hardware supports advanced AVX512(spr), the k-NN plugin loads the `libopensearchknn_faiss_avx512_spr.so` library at runtime. - -If your hardware supports AVX-512, the k-NN plugin loads the `libopensearchknn_faiss_avx512.so` library at runtime. - -If your hardware supports AVX2 but doesn't support AVX-512, the k-NN plugin loads the `libopensearchknn_faiss_avx2.so` library at runtime. - -To disable the advanced AVX-512 (for Sapphire Rapids or newer-generation processors), AVX512, and AVX2 SIMD instructions and load the non-optimized Faiss library (`libopensearchknn_faiss.so`), specify the `knn.faiss.avx512_spr.disabled`, `knn.faiss.avx512.disabled`, and `knn.faiss.avx2.disabled` static settings as `true` in `opensearch.yml` (by default, all of these are `false`). - -Note that to update a static setting, you must stop the cluster, change the setting, and restart the cluster. For more information, see [Static settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). - -### ARM64 architecture - -For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library contains Neon SIMD instructions and cannot be disabled. - -## Method definitions - -A method definition refers to the underlying configuration of the approximate k-NN algorithm you want to use. Method definitions are used to either create a `knn_vector` field (when the method does not require training) or [create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model) that can then be used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). - -A method definition will always contain the name of the method, the space_type the method is built for, the engine -(the library) to use, and a map of parameters. - -Mapping parameter | Required | Default | Updatable | Description -:--- | :--- | :--- | :--- | :--- -`name` | true | n/a | false | The identifier for the nearest neighbor method. -`space_type` | false | l2 | false | The vector space used to calculate the distance between vectors. Note: This value can also be specified at the top level of the mapping. -`engine` | false | faiss | false | The approximate k-NN library to use for indexing and search. The available libraries are `faiss`, `lucene`, and `nmslib` (deprecated). -`parameters` | false | null | false | The parameters used for the nearest neighbor method. - -### Supported NMSLIB methods - -Method name | Requires training | Supported spaces | Description -:--- | :--- | :--- | :--- -`hnsw` | false | l2, innerproduct, cosinesimil, l1, linf | Hierarchical proximity graph approach to approximate k-NN search. For more details on the algorithm, see this [abstract](https://arxiv.org/abs/1603.09320). - -#### HNSW parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :--- | :--- | :--- -`ef_construction` | false | 100 | false | The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed. -`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100. - -For nmslib (deprecated), *ef_search* is set in the [index settings](#index-settings). -{: .note} - -An index created in OpenSearch version 2.11 or earlier will still use the old `ef_construction` value (`512`). -{: .note} - -### Supported Faiss methods - -Method name | Requires training | Supported spaces | Description -:--- | :--- |:---| :--- -`hnsw` | false | l2, innerproduct, hamming | Hierarchical proximity graph approach to approximate k-NN search. -`ivf` | true | l2, innerproduct, hamming | Stands for _inverted file index_. Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets is searched. - -For hnsw, "innerproduct" is not available when PQ is used. -{: .note} - -The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). -{: .note} - -#### HNSW parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :--- | :--- | :--- -`ef_search` | false | 100 | false | The size of the dynamic list used during k-NN searches. Higher values result in more accurate but slower searches. -`ef_construction` | false | 100 | false | The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed. -`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100. -`encoder` | false | flat | false | Encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index, at the expense of search accuracy. - -An index created in OpenSearch version 2.11 or earlier will still use the old `ef_construction` and `ef_search` values (`512`). -{: .note} - -#### IVF parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :--- | :--- | :--- -`nlist` | false | 4 | false | Number of buckets to partition vectors into. Higher values may lead to more accurate searches at the expense of memory and training latency. For more information about choosing the right value, refer to [Guidelines to choose an index](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index). -`nprobes` | false | 1 | false | Number of buckets to search during query. Higher values lead to more accurate but slower searches. -`encoder` | false | flat | false | Encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index, at the expense of search accuracy. - -For more information about setting these parameters, refer to the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes). - -#### IVF training requirements - -The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model), passing the IVF method definition. IVF requires that, at a minimum, there are `nlist` training data points, but it is [recommended that you use more than this](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). Training data can be composed of either the same data that is going to be ingested or a separate dataset. - -### Supported Lucene methods - -Method name | Requires training | Supported spaces | Description -:--- | :--- |:--------------------------------------------------------------------------------| :--- -`hnsw` | false | l2, cosinesimil, innerproduct (supported in OpenSearch 2.13 and later), hamming | Hierarchical proximity graph approach to approximate k-NN search. - -The `hamming` space type is supported for binary vectors in OpenSearch version 2.19 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). -{: .note} - -#### HNSW parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :--- | :--- | :--- -`ef_construction` | false | 100 | false | The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed.
The Lucene engine uses the proprietary term "beam_width" to describe this function, which corresponds directly to "ef_construction". To be consistent throughout the OpenSearch documentation, we retain the term "ef_construction" for this parameter. -`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
The Lucene engine uses the proprietary term "max_connections" to describe this function, which corresponds directly to "m". To be consistent throughout OpenSearch documentation, we retain the term "m" to label this parameter. - -Lucene HNSW implementation ignores `ef_search` and dynamically sets it to the value of "k" in the search request. Therefore, there is no need to make settings for `ef_search` when using the Lucene engine. -{: .note} - -An index created in OpenSearch version 2.11 or earlier will still use the old `ef_construction` value (`512`). -{: .note} - -```json -"method": { - "name":"hnsw", - "engine":"lucene", - "parameters":{ - "m":2048, - "ef_construction": 245 - } -} -``` - -### Supported Faiss encoders - -You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. The k-NN plugin currently supports the `flat`, `pq`, and `sq` encoders in the Faiss library. - -The following example method definition specifies the `hnsw` method and a `pq` encoder: - -```json -"method": { - "name":"hnsw", - "engine":"faiss", - "parameters":{ - "encoder":{ - "name":"pq", - "parameters":{ - "code_size": 8, - "m": 8 - } - } - } -} -``` - -The `hnsw` method supports the `pq` encoder for OpenSearch versions 2.10 and later. The `code_size` parameter of a `pq` encoder with the `hnsw` method must be **8**. -{: .important} - -Encoder name | Requires training | Description -:--- | :--- | :--- -`flat` (Default) | false | Encode vectors as floating-point arrays. This encoding does not reduce memory footprint. -`pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-16-bit-scalar-quantization). - -#### PQ parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :--- | :--- | :--- -`m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This vector dimension must be divisible by `m`. Maximum value is 1,024. -`code_size` | false | 8 | false | Determines the number of bits into which to encode a subvector. Maximum value is 8. For IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8. - -#### SQ parameters - -Parameter name | Required | Default | Updatable | Description -:--- | :--- | :-- | :--- | :--- -`type` | false | `fp16` | false | The type of scalar quantization to be used to encode 32-bit float vectors into the corresponding type. As of OpenSearch 2.13, only the `fp16` encoder type is supported. For the `fp16` encoder, vector values must be in the [-65504.0, 65504.0] range. -`clip` | false | `false` | false | If `true`, then any vector values outside of the supported range for the specified vector type are rounded so that they are in the range. If `false`, then the request is rejected if any vector values are outside of the supported range. Setting `clip` to `true` may decrease recall. - -For more information and examples, see [Using Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#using-faiss-scalar-quantization). - -#### Examples - -The following example uses the `ivf` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): - -```json -"method": { - "name":"ivf", - "engine":"faiss", - "parameters":{ - "nlist": 4, - "nprobes": 2 - } -} -``` - -The following example uses the `ivf` method with a `pq` encoder: - -```json -"method": { - "name":"ivf", - "engine":"faiss", - "parameters":{ - "encoder":{ - "name":"pq", - "parameters":{ - "code_size": 8, - "m": 8 - } - } - } -} -``` - -The following example uses the `hnsw` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): - -```json -"method": { - "name":"hnsw", - "engine":"faiss", - "parameters":{ - "ef_construction": 256, - "m": 8 - } -} -``` - -The following example uses the `hnsw` method with an `sq` encoder of type `fp16` with `clip` enabled: - -```json -"method": { - "name":"hnsw", - "engine":"faiss", - "parameters":{ - "encoder": { - "name": "sq", - "parameters": { - "type": "fp16", - "clip": true - } - }, - "ef_construction": 256, - "m": 8 - } -} -``` - -The following example uses the `ivf` method with an `sq` encoder of type `fp16`: - -```json -"method": { - "name":"ivf", - "engine":"faiss", - "parameters":{ - "encoder": { - "name": "sq", - "parameters": { - "type": "fp16", - "clip": false - } - }, - "nprobes": 2 - } -} -``` - -### Choosing the right method - -There are several options to choose from when building your `knn_vector` field. To determine the correct methods and parameters, you should first understand the requirements of your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, and (4) indexing latency. - -If memory is not a concern, HNSW offers a strong query latency/query quality trade-off. - -If you want to use less memory and increase indexing speed as compared to HNSW while maintaining similar query quality, you should evaluate IVF. - -If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. - -You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vectors) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). - -### Memory estimation - -In a typical OpenSearch cluster, a certain portion of RAM is reserved for the JVM heap. The k-NN plugin allocates native library indexes to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set to 50%. - -Having a replica doubles the total number of vectors. -{: .note } - -For information about using memory estimation with vector quantization, see the [vector quantization documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#memory-estimation). -{: .note } - -#### HNSW memory estimation - -The memory required for HNSW is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector. - -As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: - -``` -1.1 * (4 * 256 + 8 * 16) * 1,000,000 ~= 1.267 GB -``` - -#### IVF memory estimation - -The memory required for IVF is estimated to be `1.1 * (((4 * dimension) * num_vectors) + (4 * nlist * d))` bytes. - -As an example, assume you have a million vectors with a dimension of 256 and nlist of 128. The memory requirement can be estimated as follows: - -``` -1.1 * (((4 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 1.126 GB - -``` - -## Index settings - -Additionally, the k-NN plugin introduces several index settings that can be used to configure the k-NN structure as well. - -At the moment, several parameters defined in the settings are in the deprecation process. Those parameters should be set in the mapping instead of the index settings. Parameters set in the mapping will override the parameters set in the index settings. Setting the parameters in the mapping allows an index to have multiple `knn_vector` fields with different parameters. - -Setting | Default | Updatable | Description -:--- |:--------| :--- | :--- -`index.knn` | false | false | Whether the index should build native library indexes for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but approximate k-NN search functionality will be disabled. -`index.knn.algo_param.ef_search` (Deprecated) | 100 | true | The size of the dynamic list used during k-NN searches. Higher values result in more accurate but slower searches. Only available for NMSLIB. -`index.knn.advanced.approximate_threshold` | 0 | true | The number of vectors a segment must have before creating specialized data structures for approximate search. Set to `-1` to disable building vector data structures and `0` to always build them. -`index.knn.algo_param.ef_construction` | 100 | false | Deprecated in 1.0.0. Instead, use the [mapping parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions) to set this value. -`index.knn.algo_param.m` | 16 | false | Deprecated in 1.0.0. Use the [mapping parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions) to set this value instead. -`index.knn.space_type` | l2 | false | Deprecated in 1.0.0. Use the [mapping parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions) to set this value instead. - -An index created in OpenSearch version 2.11 or earlier will still use the old `ef_construction` and `ef_search` values (`512`). -{: .note} diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md deleted file mode 100644 index b820eea3d0..0000000000 --- a/_search-plugins/knn/knn-vector-quantization.md +++ /dev/null @@ -1,501 +0,0 @@ ---- -layout: default -title: k-NN vector quantization -nav_order: 27 -parent: k-NN search -has_children: false -has_math: true ---- - -# k-NN vector quantization - -By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for the native `faiss` and `nmslib` [deprecated] engines). To reduce the memory footprint, you can use vector quantization. - -OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, product quantization (PQ), and binary quantization(BQ). - -## Byte vectors - -Starting with version 2.17, the k-NN plugin supports `byte` vectors with the `faiss` and `lucene` engines in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vectors). - -## Lucene scalar quantization - -Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike [byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vectors), which require you to quantize vectors before ingesting documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors. - -Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. - -### Using Lucene scalar quantization - -To use the Lucene scalar quantizer, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index: - -```json -PUT /test-index -{ - "settings": { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 2, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "lucene", - "parameters": { - "encoder": { - "name": "sq" - }, - "ef_construction": 256, - "m": 8 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -### Confidence interval - -Optionally, you can specify the `confidence_interval` parameter in the `method.parameters.encoder` object. -The `confidence_interval` is used to compute the minimum and maximum quantiles in order to quantize the vectors: -- If you set the `confidence_interval` to a value in the `0.9` to `1.0` range, inclusive, then the quantiles are calculated statically. For example, setting the `confidence_interval` to `0.9` specifies to compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values. -- Setting `confidence_interval` to `0` specifies to compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data. -- When `confidence_interval` is not set, it is computed based on the vector dimension $$d$$ using the formula $$max(0.9, 1 - \frac{1}{1 + d})$$. - -Lucene scalar quantization is applied only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), then the request is rejected. -{: .warning} - -The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors when computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default: - -```json -PUT /test-index -{ - "settings": { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 2, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "lucene", - "parameters": { - "encoder": { - "name": "sq", - "parameters": { - "confidence_interval": 1.0 - } - }, - "ef_construction": 256, - "m": 8 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -There are no changes to ingestion or query mapping and no range limitations for the input vectors. - -### Memory estimation - -In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use only 25% of the memory required by 32-bit vectors. - -#### HNSW memory estimation - -The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. - -As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: - -```r -1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.4 GB -``` - -## Faiss 16-bit scalar quantization - -Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. - -At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. - -SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies. -{: .warning} - -### Using Faiss scalar quantization - -To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index: - -```json -PUT /test-index -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100 - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 3, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "faiss", - "parameters": { - "encoder": { - "name": "sq" - }, - "ef_construction": 256, - "m": 8 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). - -The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. - -When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`. - -We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall. -{: .note} - -The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default): - -```json -PUT /test-index -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100 - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 3, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "faiss", - "parameters": { - "encoder": { - "name": "sq", - "parameters": { - "type": "fp16" - } - }, - "ef_construction": 256, - "m": 8 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -During ingestion, make sure each vector dimension is in the supported range ([-65504.0, 65504.0]). - -```json -PUT test-index/_doc/1 -{ - "my_vector1": [-65504.0, 65503.845, 55.82] -} -``` -{% include copy-curl.html %} - -During querying, the query vector has no range limitation: - -```json -GET test-index/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector1": { - "vector": [265436.876, -120906.256, 99.84], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} - -### Memory estimation - -In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. - -#### HNSW memory estimation - -The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (2 * dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. - -As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The memory requirement can be estimated as follows: - -```r -1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB -``` - -#### IVF memory estimation - -The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * dimension))` bytes/vector, where `nlist` is the number of buckets to partition vectors into. - -As an example, assume that you have 1 million vectors with a dimension of 256 and an `nlist` of 128. The memory requirement can be estimated as follows: - -```r -1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB -``` - -## Faiss product quantization - -PQ is a technique used to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression as compared to byte or scalar quantization. PQ works by separating vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector is `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or _IVF_ approximate nearest neighbor (ANN) algorithms. - -### Using Faiss product quantization - -To minimize loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched. - -The product quantizer is trained by running k-means clustering on a set of training vectors for each subvector space and extracts the centroids to be used for encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. - -In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm is used and how much data will be stored in the index. For IVF-based indexes, a recommended number of training vectors is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a recommended number is `2^code_size*1000`. See the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more information about the methodology used to calculate these figures. - -For PQ, both _m_ and _code_size_ need to be selected. _m_ determines the number of subvectors into which vectors should be split for separate encoding. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines the number of bits used to encode each subvector. In general, we recommend a setting of `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. - -For an example of setting up an index with PQ, see the [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model) tutorial. - -### Memory estimation - -While PQ is meant to represent individual vectors with `m*code_size` bits, in reality, the indexes consume more space. This is mainly due to the overhead of storing certain code tables and auxiliary data structures. - -Some of the memory formulas depend on the number of segments present. This is not typically known beforehand, but a recommended default value is 300. -{: .note} - -#### HNSW memory estimation - -The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24 + 8 * hnsw_m) * num_vectors + num_segments * (2^pq_code_size * 4 * d))` bytes. - -As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: - -```r -1.1 * ((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB -``` - -#### IVF memory estimation - -The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes. - -For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: - -```r -1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB -``` - -## Binary quantization - -Starting with version 2.17, OpenSearch supports BQ with binary vector support for the Faiss engine. BQ compresses vectors into a binary format (0s and 1s), making it highly efficient in terms of memory usage. You can choose to represent each vector dimension using 1, 2, or 4 bits, depending on the desired precision. One of the advantages of using BQ is that the training process is handled automatically during indexing. This means that no separate training step is required, unlike other quantization techniques such as PQ. - -### Using BQ -To configure BQ for the Faiss engine, define a `knn_vector` field and specify the `mode` as `on_disk`. This configuration defaults to 1-bit BQ and both `ef_search` and `ef_construction` set to `100`: - -```json -PUT my-vector-index -{ - "settings" : { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector_field": { - "type": "knn_vector", - "dimension": 8, - "space_type": "l2", - "data_type": "float", - "mode": "on_disk" - } - } - } -} -``` -{% include copy-curl.html %} - -To further optimize the configuration, you can specify additional parameters, such as the compression level, and fine-tune the search parameters. For example, you can override the `ef_construction` value or define the compression level, which corresponds to the number of bits used for quantization: - -- **32x compression** for 1-bit quantization -- **16x compression** for 2-bit quantization -- **8x compression** for 4-bit quantization - -This allows for greater control over memory usage and recall performance, providing flexibility to balance between precision and storage efficiency. - -To specify the compression level, set the `compression_level` parameter: - -```json -PUT my-vector-index -{ - "settings" : { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector_field": { - "type": "knn_vector", - "dimension": 8, - "space_type": "l2", - "data_type": "float", - "mode": "on_disk", - "compression_level": "16x", - "method": { - "name": "hnsw", - "engine": "faiss", - "parameters": { - "ef_construction": 16 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -The following example further fine-tunes the configuration by defining `ef_construction`, `encoder`, and the number of `bits` (which can be `1`, `2`, or `4`): - -```json -PUT my-vector-index -{ - "settings" : { - "index": { - "knn": true - } - }, - "mappings": { - "properties": { - "my_vector_field": { - "type": "knn_vector", - "dimension": 8, - "method": { - "name": "hnsw", - "engine": "faiss", - "space_type": "l2", - "parameters": { - "m": 16, - "ef_construction": 512, - "encoder": { - "name": "binary", - "parameters": { - "bits": 1 - } - } - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -### Search using binary quantized vectors - -You can perform a k-NN search on your index by providing a vector and specifying the number of nearest neighbors (k) to return: - -```json -GET my-vector-index/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector_field": { - "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], - "k": 10 - } - } - } -} -``` -{% include copy-curl.html %} - -You can also fine-tune search by providing the `ef_search` and `oversample_factor` parameters. -The `oversample_factor` parameter controls the factor by which the search oversamples the candidate vectors before ranking them. Using a higher oversample factor means that more candidates will be considered before ranking, improving accuracy but also increasing search time. When selecting the `oversample_factor` value, consider the trade-off between accuracy and efficiency. For example, setting the `oversample_factor` to `2.0` will double the number of candidates considered during the ranking phase, which may help achieve better results. - -The following request specifies the `ef_search` and `oversample_factor` parameters: - -```json -GET my-vector-index/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector_field": { - "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], - "k": 10, - "method_parameters": { - "ef_search": 10 - }, - "rescore": { - "oversample_factor": 10.0 - } - } - } - } -} -``` -{% include copy-curl.html %} - - -#### HNSW memory estimation - -The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. - -As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The following sections provide memory requirement estimations for various compression values. - -##### 1-bit quantization (32x compression) - -In 1-bit quantization, each dimension is represented using 1 bit, equivalent to a 32x compression factor. The memory requirement can be estimated as follows: - -```r -Memory = 1.1 * ((256 * 1 / 8) + 8 * 16) * 1,000,000 - ~= 0.176 GB -``` - -##### 2-bit quantization (16x compression) - -In 2-bit quantization, each dimension is represented using 2 bits, equivalent to a 16x compression factor. The memory requirement can be estimated as follows: - -```r -Memory = 1.1 * ((256 * 2 / 8) + 8 * 16) * 1,000,000 - ~= 0.211 GB -``` - -##### 4-bit quantization (8x compression) - -In 4-bit quantization, each dimension is represented using 4 bits, equivalent to an 8x compression factor. The memory requirement can be estimated as follows: - -```r -Memory = 1.1 * ((256 * 4 / 8) + 8 * 16) * 1,000,000 - ~= 0.282 GB -``` diff --git a/_search-plugins/knn/painless-functions.md b/_search-plugins/knn/painless-functions.md deleted file mode 100644 index 4b2311ad65..0000000000 --- a/_search-plugins/knn/painless-functions.md +++ /dev/null @@ -1,76 +0,0 @@ ---- -layout: default -title: k-NN Painless extensions -nav_order: 25 -parent: k-NN search -has_children: false -has_math: true ---- - -# k-NN Painless Scripting extensions - -With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin adds Painless Scripting extensions to a few of the distance functions used in [k-NN score script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script), so you can use them to customize your k-NN workload. - -## Get started with k-NN's Painless Scripting functions - -To use k-NN's Painless Scripting functions, first create an index with `knn_vector` fields like in [k-NN score script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script#getting-started-with-the-score-script-for-vectors). Once the index is created and you ingest some data, you can use the painless extensions: - -```json -GET my-knn-index-2/_search -{ - "size": 2, - "query": { - "script_score": { - "query": { - "bool": { - "filter": { - "term": { - "color": "BLUE" - } - } - } - }, - "script": { - "source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])", - "params": { - "field": "my_vector", - "query_value": [9.9, 9.9] - } - } - } - } -} -``` -{% include copy-curl.html %} - -`field` needs to map to a `knn_vector` field, and `query_value` needs to be a floating point array with the same dimension as `field`. - -## Function types -The following table describes the available painless functions the k-NN plugin provides: - -Function name | Function signature | Description -:--- | :--- -l2Squared | `float l2Squared (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors. -l1Norm | `float l1Norm (float[] queryVector, doc['vector field'])` | This function calculates the L1 Norm distance (Manhattan distance) between a given query vector and document vectors. -cosineSimilarity | `float cosineSimilarity (float[] queryVector, doc['vector field'])` | Cosine similarity is an inner product of the query vector and document vector normalized to both have a length of 1. If the magnitude of the query vector doesn't change throughout the query, you can pass the magnitude of the query vector to improve performance, instead of calculating the magnitude every time for every filtered document:
`float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)`
In general, the range of cosine similarity is [-1, 1]. However, in the case of information retrieval, the cosine similarity of two documents ranges from 0 to 1 because the tf-idf statistic can't be negative. Therefore, the k-NN plugin adds 1.0 in order to always yield a positive cosine similarity score. -hamming | `float hamming (float[] queryVector, doc['vector field'])` | This function calculates the Hamming distance between a given query vector and document vectors. The Hamming distance is the number of positions at which the corresponding elements are different. The shorter the distance, the more relevant the document is, so this example inverts the return value of the Hamming distance. - -The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). -{: .note} - -## Constraints - -1. If a document’s `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`. - -2. If a vector field doesn't have a value, the function throws an IllegalStateException. - - You can avoid this situation by first checking if a document has a value in its field: - - ``` - "source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))", - ``` - - Because scores can only be positive, this script ranks documents with vector fields higher than those without. - -With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. -{: .note } diff --git a/_search-plugins/knn/performance-tuning.md b/_search-plugins/knn/performance-tuning.md deleted file mode 100644 index ae2368b597..0000000000 --- a/_search-plugins/knn/performance-tuning.md +++ /dev/null @@ -1,223 +0,0 @@ ---- -layout: default -title: Performance tuning -parent: k-NN search -nav_order: 45 ---- - -# Performance tuning - -This topic provides performance tuning recommendations to improve indexing and search performance for approximate k-NN (ANN). From a high level, k-NN works according to these principles: -* Native library indexes are created per knn_vector field / (Lucene) segment pair. -* Queries execute on segments sequentially inside the shard (same as any other OpenSearch query). -* Each native library index in the segment returns <=k neighbors. -* The coordinator node picks up final size number of neighbors from the neighbors returned by each shard. - -This topic also provides recommendations for comparing approximate k-NN to exact k-NN with score script. - -## Indexing performance tuning - -Take any of the following steps to improve indexing performance, especially when you plan to index a large number of vectors at once. - -### Disable the refresh interval - -Either disable the refresh interval (default = 1 sec) or set a long duration for the refresh interval to avoid creating multiple small segments: - - ```json - PUT //_settings - { - "index" : { - "refresh_interval" : "-1" - } - } - ``` - -Make sure to reenable `refresh_interval` after indexing is complete. - -### Disable replicas (no OpenSearch replica shard) - - Set replicas to `0` to prevent duplicate construction of native library indexes in both primary and replica shards. When you enable replicas after indexing completes, the serialized native library indexes are copied directly. If you have no replicas, losing nodes might cause data loss, so it's important that the data be stored elsewhere so that this initial load can be retried in the event of an issue. - -### Increase the number of indexing threads - -If your hardware has multiple cores, you can allow multiple threads in native library index construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting. - -Monitor CPU utilization and choose the correct number of threads. Because native library index construction is costly, choosing more threads then you need can cause additional CPU load. - - -### (Expert level) Disable vector field storage in the source field - -The `_source` field contains the original JSON document body that was passed at index time. This field is not indexed and is not searchable but is stored so that it can be returned when executing fetch requests such as `get` and `search`. When using vector fields within the source, you can remove the vector field to save disk space, as shown in the following example where the `location` vector is excluded: - - ```json - PUT //_mappings - { - "_source": { - "excludes": ["location"] - }, - "properties": { - "location": { - "type": "knn_vector", - "dimension": 2, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "faiss" - } - } - } - } - ``` - - -Disabling the `_source` field can cause certain features to become unavailable, such as the `update`, `update_by_query`, and `reindex` APIs and the ability to debug queries or aggregations by using the original document at index time. - -In OpenSearch 2.15 or later, you can further improve indexing speed and reduce disk space by removing the vector field from the `_recovery_source`, as shown in the following example: - - ```json - PUT //_mappings - { - "_source": { - "excludes": ["location"], - "recovery_source_excludes": ["location"] - }, - "properties": { - "location": { - "type": "knn_vector", - "dimension": 2, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "faiss" - } - } - } - } - ``` - -This is an expert-level setting. Disabling the `_recovery_source` may lead to failures during peer-to-peer recovery. Before disabling the `_recovery_source`, check with your OpenSearch cluster admin to determine whether your cluster performs regular flushes before starting the peer-to-peer recovery of shards prior to disabling the `_recovery_source`. -{: .warning} - -### (Expert level) Build vector data structures on demand - -This approach is recommended only for workloads that involve a single initial bulk upload and will be used exclusively for search after force merging to a single segment. - -During indexing, vector search builds a specialized data structure for a `knn_vector` field to enable efficient approximate k-NN search. However, these structures are rebuilt during [force merge]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) on k-NN indexes. To optimize indexing speed, follow these steps: - -1. **Disable vector data structure creation**: Disable vector data structure creation for new segments by setting [`index.knn.advanced.approximate_threshold`]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#index-settings) to `-1`. - - To specify the setting at index creation, send the following request: - - ```json - PUT /test-index/ - { - "settings": { - "index.knn.advanced.approximate_threshold": "-1" - } - } - ``` - {% include copy-curl.html %} - - To specify the setting after index creation, send the following request: - - ```json - PUT /test-index/_settings - { - "index.knn.advanced.approximate_threshold": "-1" - } - ``` - {% include copy-curl.html %} - -1. **Perform bulk indexing**: Index data in [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) without performing any searches during ingestion: - - ```json - POST _bulk - { "index": { "_index": "test-index", "_id": "1" } } - { "my_vector1": [1.5, 2.5], "price": 12.2 } - { "index": { "_index": "test-index", "_id": "2" } } - { "my_vector1": [2.5, 3.5], "price": 7.1 } - ``` - {% include copy-curl.html %} - - If searches are performed while vector data structures are disabled, they will run using exact k-NN search. - -1. **Reenable vector data structure creation**: Once indexing is complete, enable vector data structure creation by setting `index.knn.advanced.approximate_threshold` to `0`: - - ```json - PUT /test-index/_settings - { - "index.knn.advanced.approximate_threshold": "0" - } - ``` - {% include copy-curl.html %} - - If you do not reset the setting to `0` before the force merge, you will need to reindex your data. - {: .note} - -1. **Force merge segments into one segment**: Perform a force merge and specify `max_num_segments=1` to create the vector data structures only once: - - ```json - POST test-index/_forcemerge?max_num_segments=1 - ``` - {% include copy-curl.html %} - - After the force merge, new search requests will execute approximate k-NN search using the newly created data structures. - -## Search performance tuning - -Take the following steps to improve search performance: - -### Reduce segment count - - To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results. - - Ideally, having one segment per shard provides the optimal performance with respect to search latency. You can configure an index to have multiple shards to avoid giant shards and achieve more parallelism. - - You can control the number of segments by choosing a larger refresh interval, or during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval. - -### Warm up the index - - Native library indexes are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top 'size' number of results based on the score are returned from all the results returned by segments at a shard level (higher score = better result). - - Once a native library index is loaded (native library indexes are loaded outside OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and take a few seconds, while subsequent queries are faster and take milliseconds (assuming the k-NN circuit breaker isn't hit). - - To avoid this latency penalty during your first queries, you can use the warmup API operation on the indexes you want to search: - - ```json - GET /_plugins/_knn/warmup/index1,index2,index3?pretty - { - "_shards" : { - "total" : 6, - "successful" : 6, - "failed" : 0 - } - } - ``` - - The warmup API operation loads all native library indexes for all shards (primary and replica) for the specified indexes into the cache, so there's no penalty to load native library indexes during initial searches. - -This API operation only loads the segments of active indexes into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those native library indexes into memory. -{: .warning} - - -### Avoid reading stored fields - - If your use case is simply to read the IDs and scores of the nearest neighbors, you can disable reading stored fields, which saves time retrieving the vectors from stored fields. - -### Use `mmap` file I/O - - For the Lucene-based approximate k-NN search, there is no dedicated cache layer that speeds up read/write operations. Instead, the plugin relies on the existing caching mechanism in OpenSearch core. In versions 2.4 and earlier of the Lucene-based approximate k-NN search, read/write operations were based on Java NIO by default, which can be slow, depending on the Lucene version and number of segments per shard. Starting with version 2.5, k-NN enables [`mmap`](https://en.wikipedia.org/wiki/Mmap) file I/O by default when the store type is `hybridfs` (the default store type in OpenSearch). This leads to fast file I/O operations and improves the overall performance of both data ingestion and search. The two file extensions specific to vector values that use `mmap` are `.vec` and `.vem`. For more information about these file extensions, see [the Lucene documentation](https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsFormat.html). - - The `mmap` file I/O uses the system file cache rather than memory allocated for the Java heap, so no additional allocation is required. To change the default list of extensions set by the plugin, update the `index.store.hybrid.mmap.extensions` setting at the cluster level using the [Cluster Settings API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings). **Note**: This is an expert-level setting that requires closing the index before updating the setting and reopening it after the update. - -## Improving recall - -Recall depends on multiple factors like number of vectors, number of dimensions, segments, and so on. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the native library index, the more chances of losing recall if you're using smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it's important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation. - -The default parameters work on a broader set of use cases, but make sure to run your own experiments on your data sets and choose the appropriate values. For index-level settings, see [Index settings]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#index-settings). - -## Approximate nearest neighbor versus score script - -The standard k-NN query and custom scoring option perform differently. Test with a representative set of documents to see if the search results and latencies match your expectations. - -Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latency, but be sure to keep shard size within the [recommended guidelines]({{site.url}}{{site.baseurl}}/intro/#primary-and-replica-shards). diff --git a/_search-plugins/knn/settings.md b/_search-plugins/knn/settings.md deleted file mode 100644 index efc87f4c6e..0000000000 --- a/_search-plugins/knn/settings.md +++ /dev/null @@ -1,40 +0,0 @@ ---- -layout: default -title: Settings -parent: k-NN search -nav_order: 40 ---- - -# k-NN settings - -The k-NN plugin adds several new cluster settings. To learn more about static and dynamic settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). - -## Cluster settings - -The following table lists all available cluster-level k-NN settings. For more information about cluster settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#updating-cluster-settings-using-the-api) and [Updating cluster settings using the API]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#updating-cluster-settings-using-the-api). - -Setting | Static/Dynamic | Default | Description -:--- | :--- | :--- | :--- -`knn.plugin.enabled`| Dynamic | `true` | Enables or disables the k-NN plugin. -`knn.algo_param.index_thread_qty` | Dynamic | `1` | The number of threads used for native library and Lucene library (for OpenSearch version 2.19 and later) index creation. Keeping this value low reduces the CPU impact of the k-NN plugin but also reduces indexing performance. -`knn.cache.item.expiry.enabled` | Dynamic | `false` | Whether to remove native library indexes that have not been accessed for a certain duration from memory. -`knn.cache.item.expiry.minutes` | Dynamic | `3h` | If enabled, the amount of idle time before a native library index is removed from memory. -`knn.circuit_breaker.unset.percentage` | Dynamic | `75` | The native memory usage threshold for the circuit breaker. Memory usage must be lower than this percentage of `knn.memory.circuit_breaker.limit` in order for `knn.circuit_breaker.triggered` to remain `false`. -`knn.circuit_breaker.triggered` | Dynamic | `false` | True when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value. -`knn.memory.circuit_breaker.limit` | Dynamic | `50%` | The native memory limit for native library indexes. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, then the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, then the plugin removes the native library indexes used least recently. -`knn.memory.circuit_breaker.enabled` | Dynamic | `true` | Whether to enable the k-NN memory circuit breaker. -`knn.model.index.number_of_shards`| Dynamic | `1` | The number of shards to use for the model system index, which is the OpenSearch index that stores the models used for approximate nearest neighbor (ANN) search. -`knn.model.index.number_of_replicas`| Dynamic | `1` | The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this value should be at least 1 in order to increase stability. -`knn.model.cache.size.limit` | Dynamic | `10%` | The model cache limit cannot exceed 25% of the JVM heap. -`knn.faiss.avx2.disabled` | Static | `false` | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx2.so` library and load the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). -`knn.faiss.avx512.disabled` | Static | `false` | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx512.so` library and load the `libopensearchknn_faiss_avx2.so` library or the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). -`knn.faiss.avx512_spr.disabled` | Static | `false` | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx512_spr.so` library and load either the `libopensearchknn_faiss_avx512.so` , `libopensearchknn_faiss_avx2.so`, or the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). - -## Index settings - -The following table lists all available index-level k-NN settings. All settings are static. For information about updating static index-level settings, see [Updating a static index setting]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#updating-a-static-index-setting). - -Setting | Default | Description -:--- | :--- | :--- -`index.knn.advanced.filtered_exact_search_threshold`| `null` | The filtered ID threshold value used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is lower than this setting's value, then exact search will be performed on the filtered IDs. -`index.knn.algo_param.ef_search` | `100` | `ef` (or `efSearch`) represents the size of the dynamic list for the nearest neighbors used during a search. Higher `ef` values lead to a more accurate but slower search. `ef` cannot be set to a value lower than the number of queried nearest neighbors, `k`. `ef` can take any value between `k` and the size of the dataset. diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md deleted file mode 100644 index 931c9ce593..0000000000 --- a/_search-plugins/neural-search.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -layout: default -title: Neural search -nav_order: 25 -has_children: false -has_toc: false -redirect_from: - - /neural-search-plugin/index/ ---- - -# Neural search - -Neural search transforms text into vectors and facilitates vector search both at ingestion time and at search time. During ingestion, neural search transforms document text into vector embeddings and indexes both the text and its vector embeddings in a vector index. When you use a neural query during search, neural search converts the query text into vector embeddings, uses vector search to compare the query and document embeddings, and returns the closest results. - -Before you ingest documents into an index, documents are passed through a machine learning (ML) model, which generates vector embeddings for the document fields. When you send a search request, the query text or image is also passed through the ML model, which generates the corresponding vector embeddings. Then neural search performs a vector search on the embeddings and returns matching documents. - -## Prerequisite - -Before using neural search, you must set up an ML model. When selecting a model, you have the following options: - -- Use a pretrained model provided by OpenSearch. For more information, see [OpenSearch-provided pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/). - -- Upload your own model to OpenSearch. For more information, see [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/). - -- Connect to a foundation model hosted on an external platform. For more information, see [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). - - -## Tutorial - -For a step-by-step tutorial, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/). - -## Using an ML model for neural search - -Once you set up an ML model, choose one of the following search methods to use your model for neural search. - -### Semantic search - -Semantic search uses dense retrieval based on text embedding models to search text data. For detailed setup instructions, see [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/). - -### Hybrid search - -Hybrid search combines keyword and neural search to improve search relevance. For detailed setup instructions, see [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/). - -### Multimodal search - -Multimodal search uses neural search with multimodal embedding models to search text and image data. For detailed setup instructions, see [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/). - -### Sparse search - -Sparse search uses neural search with sparse retrieval based on sparse embedding models to search text data. For detailed setup instructions, see [Sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). - -### Conversational search - -With conversational search, you can ask questions in natural language, receive a text response, and ask additional clarifying questions. For detailed setup instructions, see [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). diff --git a/_search-plugins/vector-search.md b/_search-plugins/vector-search.md deleted file mode 100644 index 5b6fc7f371..0000000000 --- a/_search-plugins/vector-search.md +++ /dev/null @@ -1,283 +0,0 @@ ---- -layout: default -title: Vector search -nav_order: 22 -has_children: false -has_toc: false ---- - -# Vector search - -OpenSearch is a comprehensive search platform that supports a variety of data types, including vectors. OpenSearch vector database functionality is seamlessly integrated with its generic database function. - -In OpenSearch, you can generate vector embeddings, store those embeddings in an index, and use them for vector search. Choose one of the following options: - -- Generate embeddings using a library of your choice before ingesting them into OpenSearch. Once you ingest vectors into an index, you can perform a vector similarity search on the vector space. For more information, see [Working with embeddings generated outside of OpenSearch](#working-with-embeddings-generated-outside-of-opensearch). -- Automatically generate embeddings within OpenSearch. To use embeddings for semantic search, the ingested text (the corpus) and the query need to be embedded using the same model. [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) packages this functionality, eliminating the need to manage the internal details. For more information, see [Generating vector embeddings within OpenSearch](#generating-vector-embeddings-in-opensearch). - -## Working with embeddings generated outside of OpenSearch - -After you generate vector embeddings, upload them to an OpenSearch index and search the index using vector search. For a complete example, see [Example](#example). - -### k-NN index - -To build a vector database and use vector search, you must specify your index as a [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/) when creating it by setting `index.knn` to `true`: - -```json -PUT test-index -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100 - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 1024, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "faiss", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -### k-NN vector - -You must designate the field that will store vectors as a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field type. OpenSearch supports vectors of up to 16,000 dimensions, each of which is represented as a 32-bit or 16-bit float. - -To save storage space, you can use `byte` or `binary` vectors. For more information, see [Byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#byte-vectors) and [Binary vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). - -### k-NN vector search - -Vector search finds the vectors in your database that are most similar to the query vector. OpenSearch supports the following search methods: - -- [Approximate search](#approximate-search) (approximate k-NN, or ANN): Returns approximate nearest neighbors to the query vector. Usually, approximate search algorithms sacrifice indexing speed and search accuracy in exchange for performance benefits such as lower latency, smaller memory footprints, and more scalable search. For most use cases, approximate search is the best option. - -- Exact search (exact k-NN): A brute-force, exact k-NN search of vector fields. OpenSearch supports the following types of exact search: - - [Exact k-NN with scoring script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/): Using the k-NN scoring script, you can apply a filter to an index before executing the nearest neighbor search. - - [Painless extensions]({{site.url}}{{site.baseurl}}/search-plugins/knn/painless-functions/): Adds the distance functions as Painless extensions that you can use in more complex combinations. You can use this method to perform a brute-force, exact k-NN search of an index, which also supports pre-filtering. - -### Approximate search - -OpenSearch supports several algorithms for approximate vector search, each with its own advantages. For complete documentation, see [Approximate search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/). For more information about the search methods and engines, see [Method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions). For method recommendations, see [Choosing the right method]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#choosing-the-right-method). - -To use approximate vector search, specify one of the following search methods (algorithms) in the `method` parameter: - -- Hierarchical Navigable Small World (HNSW) -- Inverted File System (IVF) - -Additionally, specify the engine (library) that implements this method in the `engine` parameter: - -- [Non-Metric Space Library (NMSLIB)](https://github.com/nmslib/nmslib) -- [Facebook AI Similarity Search (Faiss)](https://github.com/facebookresearch/faiss) -- Lucene - -The following table lists the combinations of search methods and libraries supported by the k-NN engine for approximate vector search. - -Method | Engine -:--- | :--- -HNSW | Faiss, Lucene, NMSLIB (deprecated) -IVF | Faiss - -### Engine recommendations - -In general, select Faiss for large-scale use cases. Lucene is a good option for smaller deployments and offers benefits like smart filtering, where the optimal filtering strategy—pre-filtering, post-filtering, or exact k-NN—is automatically applied depending on the situation. The following table summarizes the differences between each option. - -| | NMSLIB/HNSW | Faiss/HNSW | Faiss/IVF | Lucene/HNSW | -|:---|:---|:---|:---|:---| -| Max dimensions | 16,000 | 16,000 | 16,000 | 16,000 | -| Filter | Post-filter | Post-filter | Post-filter | Filter during search | -| Training required | No | No | Yes | No | -| Similarity metrics | `l2`, `innerproduct`, `cosinesimil`, `l1`, `linf` | `l2`, `innerproduct` | `l2`, `innerproduct` | `l2`, `cosinesimil` | -| Number of vectors | Tens of billions | Tens of billions | Tens of billions | Less than 10 million | -| Indexing latency | Low | Low | Lowest | Low | -| Query latency and quality | Low latency and high quality | Low latency and high quality | Low latency and low quality | High latency and high quality | -| Vector compression | Flat | Flat
Product quantization | Flat
Product quantization | Flat | -| Memory consumption | High | High
Low with PQ | Medium
Low with PQ | High | - -### Example - -In this example, you'll create a k-NN index, add data to the index, and search the data. - -#### Step 1: Create a k-NN index - -First, create an index that will store sample hotel data. Set `index.knn` to `true` and specify the `location` field as a `knn_vector`: - -```json -PUT /hotels-index -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100, - "number_of_shards": 1, - "number_of_replicas": 0 - } - }, - "mappings": { - "properties": { - "location": { - "type": "knn_vector", - "dimension": 2, - "space_type": "l2", - "method": { - "name": "hnsw", - "engine": "lucene", - "parameters": { - "ef_construction": 100, - "m": 16 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Step 2: Add data to your index - -Next, add data to your index. Each document represents a hotel. The `location` field in each document contains a vector specifying the hotel's location: - -```json -POST /_bulk -{ "index": { "_index": "hotels-index", "_id": "1" } } -{ "location": [5.2, 4.4] } -{ "index": { "_index": "hotels-index", "_id": "2" } } -{ "location": [5.2, 3.9] } -{ "index": { "_index": "hotels-index", "_id": "3" } } -{ "location": [4.9, 3.4] } -{ "index": { "_index": "hotels-index", "_id": "4" } } -{ "location": [4.2, 4.6] } -{ "index": { "_index": "hotels-index", "_id": "5" } } -{ "location": [3.3, 4.5] } -``` -{% include copy-curl.html %} - -#### Step 3: Search your data - -Now search for hotels closest to the pin location `[5, 4]`. This location is labeled `Pin` in the following image. Each hotel is labeled with its document number. - -![Hotels on a coordinate plane]({{site.url}}{{site.baseurl}}/images/k-nn-search-hotels.png/) - -To search for the top three closest hotels, set `k` to `3`: - -```json -POST /hotels-index/_search -{ - "size": 3, - "query": { - "knn": { - "location": { - "vector": [ - 5, - 4 - ], - "k": 3 - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains the hotels closest to the specified pin location: - -```json -{ - "took": 1093, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 3, - "relation": "eq" - }, - "max_score": 0.952381, - "hits": [ - { - "_index": "hotels-index", - "_id": "2", - "_score": 0.952381, - "_source": { - "location": [ - 5.2, - 3.9 - ] - } - }, - { - "_index": "hotels-index", - "_id": "1", - "_score": 0.8333333, - "_source": { - "location": [ - 5.2, - 4.4 - ] - } - }, - { - "_index": "hotels-index", - "_id": "3", - "_score": 0.72992706, - "_source": { - "location": [ - 4.9, - 3.4 - ] - } - } - ] - } -} -``` - -### Vector search with filtering - -For information about vector search with filtering, see [k-NN search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). - -## Generating vector embeddings in OpenSearch - -[Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) encapsulates the infrastructure needed to perform semantic vector searches. After you integrate an inference (embedding) service, neural search functions like lexical search, accepting a textual query and returning relevant documents. - -When you index your data, neural search transforms text into vector embeddings and indexes both the text and its vector embeddings in a vector index. When you use a neural query during search, neural search converts the query text into vector embeddings and uses vector search to return the results. - -### Choosing a model - -The first step in setting up neural search is choosing a model. You can upload a model to your OpenSearch cluster, use one of the pretrained models provided by OpenSearch, or connect to an externally hosted model. For more information, see [Integrating ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/). - -### Neural search tutorial - -For a step-by-step tutorial, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/). - -### Search methods - -Choose one of the following search methods to use your model for neural search: - -- [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/): Uses dense retrieval based on text embedding models to search text data. - -- [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): Combines lexical and neural search to improve search relevance. - -- [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/): Uses neural search with multimodal embedding models to search text and image data. - -- [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/): Uses neural search with sparse retrieval based on sparse embedding models to search text data. - -- [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/): With conversational search, you can ask questions in natural language, receive a text response, and ask additional clarifying questions. diff --git a/_search-plugins/conversational-search.md b/_vector-search/ai-search/conversational-search.md similarity index 97% rename from _search-plugins/conversational-search.md rename to _vector-search/ai-search/conversational-search.md index be4c97b425..e2e337c40e 100644 --- a/_search-plugins/conversational-search.md +++ b/_vector-search/ai-search/conversational-search.md @@ -1,20 +1,22 @@ --- layout: default -title: Conversational search +title: Conversational search with RAG +parent: AI search has_children: false nav_order: 70 redirect_from: - /ml-commons-plugin/conversational-search/ + - /search-plugins/conversational-search/ --- -# Conversational search +# Conversational search with RAG Conversational search allows you to ask questions in natural language and refine the answers by asking follow-up questions. Thus, the conversation becomes a dialog between you and a large language model (LLM). For this to happen, instead of answering each question individually, the model needs to remember the context of the entire conversation. Conversational search is implemented with the following components: - [Conversation history](#conversation-history): Allows an LLM to remember the context of the current conversation and understand follow-up questions. -- [Retrieval-Augmented Generation (RAG)](#rag): Allows an LLM to supplement its static knowledge base with proprietary or current information. +- [Retrieval-augmented generation (RAG)](#rag): Allows an LLM to supplement its static knowledge base with proprietary or current information. ## Conversation history @@ -441,7 +443,4 @@ The response contains both messages: ## Next steps -- To learn more about connecting to models on external platforms, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). -- For supported APIs, see [Memory APIs]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/memory-apis/index/). -- To learn more about search pipelines and processors, see [Search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/). -- For available OpenSearch queries, see [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/). \ No newline at end of file +- Explore our [tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/) to learn how to build AI search applications. \ No newline at end of file diff --git a/_vector-search/ai-search/hybrid-search/aggregations.md b/_vector-search/ai-search/hybrid-search/aggregations.md new file mode 100644 index 0000000000..6ac12e4a03 --- /dev/null +++ b/_vector-search/ai-search/hybrid-search/aggregations.md @@ -0,0 +1,203 @@ +--- +layout: default +title: Combining hybrid search and aggregations +parent: Hybrid search +grand_parent: AI search +has_children: true +nav_order: 20 +--- + +# Combining hybrid search and aggregations +**Introduced 2.13** +{: .label .label-purple } + +You can enhance search results by combining a hybrid query clause with any aggregation that OpenSearch supports. Aggregations allow you to use OpenSearch as an analytics engine. For more information about aggregations, see [Aggregations]({{site.url}}{{site.baseurl}}/aggregations/). + +Most aggregations are performed on the subset of documents that is returned by a hybrid query. The only aggregation that operates on all documents is the [`global`]({{site.url}}{{site.baseurl}}/aggregations/bucket/global/) aggregation. + +To use aggregations with a hybrid query, first create an index. Aggregations are typically used on fields of special types, like `keyword` or `integer`. The following example creates an index with several such fields: + +```json +PUT /my-nlp-index +{ + "settings": { + "number_of_shards": 2 + }, + "mappings": { + "properties": { + "doc_index": { + "type": "integer" + }, + "doc_keyword": { + "type": "keyword" + }, + "category": { + "type": "keyword" + } + } + } +} +``` +{% include copy-curl.html %} + +The following request ingests six documents into your new index: + +```json +POST /_bulk +{ "index": { "_index": "my-nlp-index" } } +{ "category": "permission", "doc_keyword": "workable", "doc_index": 4976, "doc_price": 100} +{ "index": { "_index": "my-nlp-index" } } +{ "category": "sister", "doc_keyword": "angry", "doc_index": 2231, "doc_price": 200 } +{ "index": { "_index": "my-nlp-index" } } +{ "category": "hair", "doc_keyword": "likeable", "doc_price": 25 } +{ "index": { "_index": "my-nlp-index" } } +{ "category": "editor", "doc_index": 9871, "doc_price": 30 } +{ "index": { "_index": "my-nlp-index" } } +{ "category": "statement", "doc_keyword": "entire", "doc_index": 8242, "doc_price": 350 } +{ "index": { "_index": "my-nlp-index" } } +{ "category": "statement", "doc_keyword": "idea", "doc_index": 5212, "doc_price": 200 } +{ "index": { "_index": "index-test" } } +{ "category": "editor", "doc_keyword": "bubble", "doc_index": 1298, "doc_price": 130 } +{ "index": { "_index": "index-test" } } +{ "category": "editor", "doc_keyword": "bubble", "doc_index": 521, "doc_price": 75 } +``` +{% include copy-curl.html %} + +Now you can combine a hybrid query clause with a `min` aggregation: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "aggs": { + "total_price": { + "sum": { + "field": "doc_price" + } + }, + "keywords": { + "terms": { + "field": "doc_keyword", + "size": 10 + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the matching documents and the aggregation results: + +```json +{ + "took": 9, + "timed_out": false, + "_shards": { + "total": 2, + "successful": 2, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "mHRPNY4BlN82W_Ar9UMY", + "_score": 0.5, + "_source": { + "doc_price": 100, + "doc_index": 4976, + "doc_keyword": "workable", + "category": "permission" + } + }, + { + "_index": "my-nlp-index", + "_id": "m3RPNY4BlN82W_Ar9UMY", + "_score": 0.5, + "_source": { + "doc_price": 30, + "doc_index": 9871, + "category": "editor" + } + }, + { + "_index": "my-nlp-index", + "_id": "nXRPNY4BlN82W_Ar9UMY", + "_score": 0.5, + "_source": { + "doc_price": 200, + "doc_index": 5212, + "doc_keyword": "idea", + "category": "statement" + } + }, + { + "_index": "my-nlp-index", + "_id": "nHRPNY4BlN82W_Ar9UMY", + "_score": 0.5, + "_source": { + "doc_price": 350, + "doc_index": 8242, + "doc_keyword": "entire", + "category": "statement" + } + } + ] + }, + "aggregations": { + "total_price": { + "value": 680 + }, + "doc_keywords": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [ + { + "key": "entire", + "doc_count": 1 + }, + { + "key": "idea", + "doc_count": 1 + }, + { + "key": "workable", + "doc_count": 1 + } + ] + } + } +} +``` \ No newline at end of file diff --git a/_vector-search/ai-search/hybrid-search/explain.md b/_vector-search/ai-search/hybrid-search/explain.md new file mode 100644 index 0000000000..a9ca1064a5 --- /dev/null +++ b/_vector-search/ai-search/hybrid-search/explain.md @@ -0,0 +1,202 @@ +--- +layout: default +title: Hybrid search explain +parent: Hybrid search +grand_parent: AI search +has_children: true +nav_order: 50 +--- + +# Hybrid search explain +**Introduced 2.19** +{: .label .label-purple } + +You can provide the `explain` parameter to understand how scores are calculated, normalized, and combined in hybrid queries. When enabled, it provides detailed information about the scoring process for each search result. This includes revealing the score normalization techniques used, how different scores were combined, and the calculations for individual subquery scores. This comprehensive insight makes it easier to understand and optimize your hybrid query results. For more information about `explain`, see [Explain API]({{site.url}}{{site.baseurl}}/api-reference/explain/). + +`explain` is an expensive operation in terms of both resources and time. For production clusters, we recommend using it sparingly for the purpose of troubleshooting. +{: .warning } + +You can provide the `explain` parameter in a URL when running a complete hybrid query using the following syntax: + +```json +GET /_search?search_pipeline=&explain=true +POST /_search?search_pipeline=&explain=true +``` + +To use the `explain` parameter, you must configure the `hybrid_score_explanation` response processor in your search pipeline. For more information, see [Hybrid score explanation processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/explanation-processor/). + +You can also use `explain` with the individual document ID: + +```json +GET /_explain/ +POST /_explain/ +``` + +In this case, the result will contain only low-level scoring information, for example, [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) scores for text-based queries such as `term` or `match`. For an example response, see [Explain API example response]({{site.url}}{{site.baseurl}}/api-reference/explain/#example-response). + +To see the `explain` output for all results, set the parameter to `true` either in the URL or in the request body: + +```json +POST my-nlp-index/_search?search_pipeline=my_pipeline&explain=true +{ + "_source": { + "exclude": [ + "passage_embedding" + ] + }, + "query": { + "hybrid": { + "queries": [ + { + "match": { + "text": { + "query": "horse" + } + } + }, + { + "neural": { + "passage_embedding": { + "query_text": "wild west", + "model_id": "aVeif4oB5Vm0Tdw8zYO2", + "k": 5 + } + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +The response contains scoring information: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 54, + "timed_out": false, + "_shards": { + "total": 2, + "successful": 2, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 5, + "relation": "eq" + }, + "max_score": 0.9251075, + "hits": [ + { + "_shard": "[my-nlp-index][0]", + "_node": "IsuzeVYdSqKUfy0qfqil2w", + "_index": "my-nlp-index", + "_id": "5", + "_score": 0.9251075, + "_source": { + "text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse .", + "id": "2691147709.jpg" + }, + "_explanation": { + "value": 0.9251075, + "description": "arithmetic_mean combination of:", + "details": [ + { + "value": 1.0, + "description": "min_max normalization of:", + "details": [ + { + "value": 1.2336599, + "description": "weight(text:horse in 0) [PerFieldSimilarity], result of:", + "details": [ + { + "value": 1.2336599, + "description": "score(freq=1.0), computed as boost * idf * tf from:", + "details": [ + { + "value": 2.2, + "description": "boost", + "details": [] + }, + { + "value": 1.2039728, + "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", + "details": [ + { + "value": 1, + "description": "n, number of documents containing term", + "details": [] + }, + { + "value": 4, + "description": "N, total number of documents with field", + "details": [] + } + ] + }, + { + "value": 0.46575344, + "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:", + "details": [ + { + "value": 1.0, + "description": "freq, occurrences of term within document", + "details": [] + }, + { + "value": 1.2, + "description": "k1, term saturation parameter", + "details": [] + }, + { + "value": 0.75, + "description": "b, length normalization parameter", + "details": [] + }, + { + "value": 16.0, + "description": "dl, length of field", + "details": [] + }, + { + "value": 17.0, + "description": "avgdl, average length of field", + "details": [] + } + ] + } + ] + } + ] + } + ] + }, + { + "value": 0.8503647, + "description": "min_max normalization of:", + "details": [ + { + "value": 0.015177966, + "description": "within top 5", + "details": [] + } + ] + } + ] +... +``` +
+ +### Response body fields + +Field | Description +:--- | :--- +`explanation` | The `explanation` object has three properties: `value`, `description`, and `details`. The `value` property shows the result of the calculation, `description` explains what type of calculation was performed, and `details` shows any subcalculations performed. For score normalization, the information in the `description` property includes the technique used for normalization or combination and the corresponding score. diff --git a/_vector-search/ai-search/hybrid-search/index.md b/_vector-search/ai-search/hybrid-search/index.md new file mode 100644 index 0000000000..20a01f66c2 --- /dev/null +++ b/_vector-search/ai-search/hybrid-search/index.md @@ -0,0 +1,311 @@ +--- +layout: default +title: Hybrid search +parent: AI search +has_children: true +nav_order: 40 +redirect_from: + - /search-plugins/hybrid-search/ + - /vector-search/ai-search/hybrid-search/ +--- + +# Hybrid search +Introduced 2.11 +{: .label .label-purple } + +Hybrid search combines keyword and semantic search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline intercepts search results at an intermediate stage and applies processing to normalize and combine document scores. + +There are two types of processors available for hybrid search: + +- [Normalization processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) (Introduced 2.10): A score-based processor that normalizes and combines document scores from multiple query clauses, rescoring the documents using the selected normalization and combination techniques. +- [Score ranker processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/score-ranker-processor/) (Introduced 2.19): A rank-based processor that uses rank fusion to combine and rerank documents from multiple query clauses. + +**PREREQUISITE**
+To follow this example, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). If you have already generated text embeddings, ingest the embeddings into an index and skip to [Step 4](#step-4-configure-a-search-pipeline). +{: .note} + +## Using hybrid search + +To use hybrid search, follow these steps: + +1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline). +1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). +1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). +1. [Configure a search pipeline](#step-4-configure-a-search-pipeline). +1. [Search the index using hybrid search](#step-5-search-the-index-using-hybrid-search). + +## Step 1: Create an ingest pipeline + +To generate vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`text_embedding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/text-embedding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings. + +The following example request creates an ingest pipeline that converts the text from `passage_text` to text embeddings and stores the embeddings in `passage_embedding`: + +```json +PUT /_ingest/pipeline/nlp-ingest-pipeline +{ + "description": "A text embedding pipeline", + "processors": [ + { + "text_embedding": { + "model_id": "bQ1J8ooBpBj3wT4HVUsb", + "field_map": { + "passage_text": "passage_embedding" + } + } + } + ] +} +``` +{% include copy-curl.html %} + +## Step 2: Create an index for ingestion + +In order to use the text embedding processor defined in your pipeline, create a vector index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`. + +The following example request creates a vector index that is set up with a default ingest pipeline: + +```json +PUT /my-nlp-index +{ + "settings": { + "index.knn": true, + "default_pipeline": "nlp-ingest-pipeline" + }, + "mappings": { + "properties": { + "id": { + "type": "text" + }, + "passage_embedding": { + "type": "knn_vector", + "dimension": 768, + "method": { + "engine": "lucene", + "space_type": "l2", + "name": "hnsw", + "parameters": {} + } + }, + "passage_text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +For more information about creating a vector index and using supported methods, see [Creating a vector index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). + +## Step 3: Ingest documents into the index + +To ingest documents into the index created in the previous step, send the following requests: + +```json +PUT /my-nlp-index/_doc/1 +{ + "passage_text": "Hello world", + "id": "s1" +} +``` +{% include copy-curl.html %} + +```json +PUT /my-nlp-index/_doc/2 +{ + "passage_text": "Hi planet", + "id": "s2" +} +``` +{% include copy-curl.html %} + +Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. + +## Step 4: Configure a search pipeline + +To configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/), use the following request. The normalization technique in the processor is set to `min_max`, and the combination technique is set to `arithmetic_mean`. The `weights` array specifies the weights assigned to each query clause as decimal percentages: + +```json +PUT /_search/pipeline/nlp-search-pipeline +{ + "description": "Post processor for hybrid search", + "phase_results_processors": [ + { + "normalization-processor": { + "normalization": { + "technique": "min_max" + }, + "combination": { + "technique": "arithmetic_mean", + "parameters": { + "weights": [ + 0.3, + 0.7 + ] + } + } + } + } + ] +} +``` +{% include copy-curl.html %} + +## Step 5: Search the index using hybrid search + +To perform hybrid search on your index, use the [`hybrid` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/), which combines the results of keyword and semantic search. + +#### Example: Combining a neural query and a match query + +The following example request combines two query clauses---a `neural` query and a `match` query. It specifies the search pipeline created in the previous step as a query parameter: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "_source": { + "exclude": [ + "passage_embedding" + ] + }, + "query": { + "hybrid": { + "queries": [ + { + "match": { + "passage_text": { + "query": "Hi world" + } + } + }, + { + "neural": { + "passage_embedding": { + "query_text": "Hi world", + "model_id": "aVeif4oB5Vm0Tdw8zYO2", + "k": 5 + } + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +Alternatively, you can set a default search pipeline for the `my-nlp-index` index. For more information, see [Default search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/#default-search-pipeline). + +The response contains the matching document: + +```json +{ + "took" : 36, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 1.2251667, + "hits" : [ + { + "_index" : "my-nlp-index", + "_id" : "1", + "_score" : 1.2251667, + "_source" : { + "passage_text" : "Hello world", + "id" : "s1" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +#### Example: Combining a match query and a term query + +The following example request combines two query clauses---a `match` query and a `term` query. It specifies the search pipeline created in the previous step as a query parameter: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "_source": { + "exclude": [ + "passage_embedding" + ] + }, + "query": { + "hybrid": { + "queries": [ + { + "match":{ + "passage_text": "hello" + } + }, + { + "term":{ + "passage_text":{ + "value":"planet" + } + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +The response contains the matching documents: + +```json +{ + "took": 11, + "timed_out": false, + "_shards": { + "total": 2, + "successful": 2, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 0.7, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "2", + "_score": 0.7, + "_source": { + "id": "s2", + "passage_text": "Hi planet" + } + }, + { + "_index": "my-nlp-index", + "_id": "1", + "_score": 0.3, + "_source": { + "id": "s1", + "passage_text": "Hello world" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +## Next steps + +- Explore our [tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/) to learn how to build AI search applications. \ No newline at end of file diff --git a/_vector-search/ai-search/hybrid-search/pagination.md b/_vector-search/ai-search/hybrid-search/pagination.md new file mode 100644 index 0000000000..a68008c9e1 --- /dev/null +++ b/_vector-search/ai-search/hybrid-search/pagination.md @@ -0,0 +1,205 @@ +--- +layout: default +title: Paginating hybrid query results +parent: Hybrid search +grand_parent: AI search +has_children: true +nav_order: 60 +--- + +## Paginating hybrid query results +**Introduced 2.19** +{: .label .label-purple } + +You can apply pagination to hybrid query results by using the `pagination_depth` parameter in the hybrid query clause, along with the standard `from` and `size` parameters. The `pagination_depth` parameter defines the maximum number of search results that can be retrieved from each shard per subquery. For example, setting `pagination_depth` to `50` allows up to 50 results per subquery to be maintained in memory from each shard. + +To navigate through the results, use the `from` and `size` parameters: + +- `from`: Specifies the document number from which you want to start showing the results. Default is `0`. +- `size`: Specifies the number of results to return on each page. Default is `10`. + +For example, to show 10 documents starting from the 20th document, specify `from: 20` and `size: 10`. For more information about pagination, see [Paginate results]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-from-and-size-parameters). + +### The impact of pagination_depth on hybrid search results + +Changing `pagination_depth` affects the underlying set of search results retrieved before any ranking, filtering, or pagination adjustments are applied. This is because `pagination_depth` determines the number of results retrieved per subquery from each shard, which can ultimately change the result order after normalization. To ensure consistent pagination, keep the `pagination_depth` value the same while navigating between pages. + +By default, hybrid search without pagination retrieves results using the `from + size` formula, where `from` is always `0`. +{: .note} + +To enable deeper pagination, increase the `pagination_depth` value. You can then navigate through results using the `from` and `size` parameters. Note that deeper pagination can impact search performance because retrieving and processing more results requires additional computational resources. + +The following example shows a search request configured with `from: 0`, `size: 5`, and `pagination_depth: 10`. This means that up to 10 search results per shard will be retrieved for both the `bool` and `term` queries before pagination is applied: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "size": 5, + "query": { + "hybrid": { + "pagination_depth":10, + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +The response contains the first five results: + +```json +{ + "hits": { + "total": { + "value": 6, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "d3eXlZQBJkWerFzHv4eV", + "_score": 0.5, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + } + }, + { + "_index": "my-nlp-index", + "_id": "eneXlZQBJkWerFzHv4eW", + "_score": 0.5, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + } + }, + { + "_index": "my-nlp-index", + "_id": "e3eXlZQBJkWerFzHv4eW", + "_score": 0.5, + "_source": { + "category": "statement", + "doc_keyword": "entire", + "doc_index": 8242, + "doc_price": 350 + } + }, + { + "_index": "my-nlp-index", + "_id": "fHeXlZQBJkWerFzHv4eW", + "_score": 0.24999997, + "_source": { + "category": "statement", + "doc_keyword": "idea", + "doc_index": 5212, + "doc_price": 200 + } + }, + { + "_index": "index-test", + "_id": "fXeXlZQBJkWerFzHv4eW", + "_score": 5.0E-4, + "_source": { + "category": "editor", + "doc_keyword": "bubble", + "doc_index": 1298, + "doc_price": 130 + } + } + ] + } +} +``` + +The following search request is configured with `from: 6`, `size: 5`, and `pagination_depth: 10`. The `pagination_depth` remains unchanged to ensure that pagination is based on the same set of search results: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "size":5, + "from":6, + "query": { + "hybrid": { + "pagination_depth":10, + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +The response excludes the first five entries and displays the remaining results: + +```json +{ + "hits": { + "total": { + "value": 6, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "index-test", + "_id": "fneXlZQBJkWerFzHv4eW", + "_score": 5.0E-4, + "_source": { + "category": "editor", + "doc_keyword": "bubble", + "doc_index": 521, + "doc_price": 75 + } + } + ] + } +} +``` + diff --git a/_vector-search/ai-search/hybrid-search/post-filtering.md b/_vector-search/ai-search/hybrid-search/post-filtering.md new file mode 100644 index 0000000000..a458cccdaa --- /dev/null +++ b/_vector-search/ai-search/hybrid-search/post-filtering.md @@ -0,0 +1,85 @@ +--- +layout: default +title: Hybrid search with post-filtering +parent: Hybrid search +grand_parent: AI search +has_children: true +nav_order: 10 +--- + +# Hybrid search with post-filtering +**Introduced 2.13** +{: .label .label-purple } + +You can perform post-filtering on hybrid search results by providing the `post_filter` parameter in your query. + +The `post_filter` clause is applied after the search results have been retrieved. Post-filtering is useful for applying additional filters to the search results without impacting the scoring or the order of the results. + +Post-filtering does not impact document relevance scores or aggregation results. +{: .note} + +## Example + +The following example request combines two query clauses---a `term` query and a `match` query---and contains a `post_filter`: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid":{ + "queries":[ + { + "match":{ + "passage_text": "hello" + } + }, + { + "term":{ + "passage_text":{ + "value":"planet" + } + } + } + ] + } + + }, + "post_filter":{ + "match": { "passage_text": "world" } + } +} +``` +{% include copy-curl.html %} + +Compare the results to the results in the [example without post-filtering]({{site.url}}{{site.baseurl}}/vector-search/ai-search/hybrid-search/#example-combining-a-match-query-and-a-term-query). In the example without post-filtering, the response contains two documents. In this example, the response contains one document because the second document is filtered out: + +```json +{ + "took": 18, + "timed_out": false, + "_shards": { + "total": 2, + "successful": 2, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0.3, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "1", + "_score": 0.3, + "_source": { + "id": "s1", + "passage_text": "Hello world" + } + } + ] + } +} +``` diff --git a/_vector-search/ai-search/hybrid-search/search-after.md b/_vector-search/ai-search/hybrid-search/search-after.md new file mode 100644 index 0000000000..eb8250ef56 --- /dev/null +++ b/_vector-search/ai-search/hybrid-search/search-after.md @@ -0,0 +1,206 @@ +--- +layout: default +title: Hybrid search with search_after +parent: Hybrid search +grand_parent: AI search +has_children: true +nav_order: 40 +--- + +# Hybrid search with search_after +**Introduced 2.16** +{: .label .label-purple } + +You can control sorting results by applying a `search_after` condition that provides a live cursor and uses the previous page's results to obtain the next page's results. For more information about `search_after`, see [The search_after parameter]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-search_after-parameter). + +You can paginate the sorted results by applying a `search_after` condition in the sort queries. + +In the following example, sorting is applied by `doc_price` with a `search_after` condition: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "_id": { + "order": "desc" + } + } + ], + "search_after":[200] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents that are listed after the `200` sort value, sorted by `doc_price` in descending order: + +```json +{ + "took": 8, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + 100 + ] + }, + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + 30 + ] + } + ] + } +} +``` + +In the following example, sorting is applied by `id` with a `search_after` condition: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "_id": { + "order": "desc" + } + } + ], + "search_after":["7yaM4JABZkI1FQv8AwoN"] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents that are listed after the `7yaM4JABZkI1FQv8AwoN` sort value, sorted by `id` in descending order: + +```json +{ + "took": 17, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + "7iaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + "6yaM4JABZkI1FQv8AwoM" + ] + } + ] + } +} +``` \ No newline at end of file diff --git a/_vector-search/ai-search/hybrid-search/sorting.md b/_vector-search/ai-search/hybrid-search/sorting.md new file mode 100644 index 0000000000..f626966c4a --- /dev/null +++ b/_vector-search/ai-search/hybrid-search/sorting.md @@ -0,0 +1,259 @@ +--- +layout: default +title: Using sorting with a hybrid query +parent: Hybrid search +grand_parent: AI search +has_children: true +nav_order: 30 +--- + +# Using sorting with a hybrid query +**Introduced 2.16** +{: .label .label-purple } + +By default, hybrid search returns results ordered by scores in descending order. You can apply sorting to hybrid query results by providing the `sort` criteria in the search request. For more information about sort criteria, see [Sort results]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/sort/). +When sorting is applied to a hybrid search, results are fetched from the shards based on the specified sort criteria. As a result, the search results are sorted accordingly, and the document scores are `null`. Scores are only present in the hybrid search sorting results if documents are sorted by `_score`. + +In the following example, sorting is applied by `doc_price` in the hybrid query search request: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "doc_price": { + "order": "desc" + } + } + ] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents sorted by `doc_price` in descending order: + +```json +{ + "took": 35, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "7yaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "entire", + "doc_index": 8242, + "doc_price": 350 + }, + "sort": [ + 350 + ] + }, + { + "_index": "my-nlp-index", + "_id": "8CaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "idea", + "doc_index": 5212, + "doc_price": 200 + }, + "sort": [ + 200 + ] + }, + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + 100 + ] + }, + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + 30 + ] + } + ] + } +} +``` + +In the following example, sorting is applied by `_id`: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "_id": { + "order": "desc" + } + } + ] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents sorted by `_id` in descending order: + +```json +{ + "took": 33, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "8CaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "idea", + "doc_index": 5212, + "doc_price": 200 + }, + "sort": [ + "8CaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "7yaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "entire", + "doc_index": 8242, + "doc_price": 350 + }, + "sort": [ + "7yaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + "7iaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + "6yaM4JABZkI1FQv8AwoM" + ] + } + ] + } +} +``` diff --git a/_vector-search/ai-search/index.md b/_vector-search/ai-search/index.md new file mode 100644 index 0000000000..56e9887325 --- /dev/null +++ b/_vector-search/ai-search/index.md @@ -0,0 +1,66 @@ +--- +layout: default +title: AI search +nav_order: 45 +has_children: true +has_toc: false +redirect_from: + - /neural-search-plugin/index/ + - /search-plugins/neural-search/ + - /vector-search/ai-search/ +model_cards: + - heading: "Use a pretrained model provided by OpenSearch" + link: "/ml-commons-plugin/pretrained-models/" + - heading: "Upload your own model to OpenSearch" + link: "/ml-commons-plugin/custom-local-models/" + - heading: "Connect to a model hosted on an external platform" + link: "/ml-commons-plugin/remote-models/index/" +tutorial_cards: + - heading: "Getting started with semantic and hybrid search" + description: "Learn how to implement semantic and hybrid search" + link: "/vector-search/tutorials/neural-search-tutorial/" +search_method_cards: + - heading: "Semantic search" + description: "Uses dense retrieval based on text embedding models to search text data." + link: "/vector-search/ai-search/semantic-search/" + - heading: "Hybrid search" + description: "Combines keyword and semantic search to improve search relevance." + link: "/vector-search/ai-search/hybrid-search/" + - heading: "Multimodal search" + description: "Uses multimodal embedding models to search text and image data." + link: "/vector-search/ai-search/multimodal-search/" + - heading: "Neural sparse search" + description: "Uses sparse retrieval based on sparse embedding models to search text data." + link: "/vector-search/ai-search/neural-sparse-search/" + - heading: "Conversational search with RAG" + description: "Uses retrieval-augmented generation (RAG) and conversational memory to provide context-aware responses." + link: "/vector-search/ai-search/conversational-search/" +--- + +# AI search + +AI search streamlines your workflow by generating embeddings automatically. OpenSearch converts text to vectors during indexing and querying. It creates and indexes vector embeddings for documents and then processes query text into embeddings to find and return the most relevant results. + +## Prerequisite + +Before using AI search, you must set up an ML model for embedding generation. When selecting a model, you have the following options: + +- Use a pretrained model provided by OpenSearch. For more information, see [OpenSearch-provided pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/). + +- Upload your own model to OpenSearch. For more information, see [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/). + +- Connect to a foundation model hosted on an external platform. For more information, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). + +--- + +## Tutorial + +{% include cards.html cards=page.tutorial_cards %} + +--- + +## AI search methods + +Once you set up an ML model, choose one of the following search methods. + +{% include cards.html cards=page.search_method_cards %} diff --git a/_search-plugins/multimodal-search.md b/_vector-search/ai-search/multimodal-search.md similarity index 67% rename from _search-plugins/multimodal-search.md rename to _vector-search/ai-search/multimodal-search.md index 6c7ddeed5b..a68250be80 100644 --- a/_search-plugins/multimodal-search.md +++ b/_vector-search/ai-search/multimodal-search.md @@ -1,17 +1,19 @@ --- layout: default title: Multimodal search +parent: AI search nav_order: 40 has_children: false redirect_from: - /search-plugins/neural-multimodal-search/ + - /search-plugins/multimodal-search/ --- # Multimodal search Introduced 2.11 {: .label .label-purple } -Use multimodal search to search text and image data. In neural search, text search is facilitated by multimodal embedding models. +Use multimodal search to search text and image data using multimodal embedding models. **PREREQUISITE**
Before using text search, you must set up a multimodal embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). @@ -19,12 +21,12 @@ Before using text search, you must set up a multimodal embedding model. For more ## Using multimodal search -To use neural search with text and image embeddings, follow these steps: +To use multimodal search with text and image embeddings, follow these steps: 1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline). 1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). 1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). -1. [Search the index using neural search](#step-4-search-the-index-using-neural-search). +1. [Search the index](#step-4-search-the-index). ## Step 1: Create an ingest pipeline @@ -54,9 +56,9 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline ## Step 2: Create an index for ingestion -In order to use the text embedding processor defined in your pipeline, create a k-NN index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `vector_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `image_description` field should be mapped as `text`, and the `image_binary` should be mapped as `binary`. +In order to use the text embedding processor defined in your pipeline, create a vector index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `vector_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `image_description` field should be mapped as `text`, and the `image_binary` should be mapped as `binary`. -The following example request creates a k-NN index that is set up with a default ingest pipeline: +The following example request creates a vector index that is set up with a default ingest pipeline: ```json PUT /my-nlp-index @@ -89,7 +91,7 @@ PUT /my-nlp-index ``` {% include copy-curl.html %} -For more information about creating a k-NN index and its supported methods, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). +For more information about creating a vector index and its supported methods, see [Creating a vector index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). ## Step 3: Ingest documents into the index @@ -106,9 +108,9 @@ PUT /nlp-index/_doc/1 Before the document is ingested into the index, the ingest pipeline runs the `text_image_embedding` processor on the document, generating vector embeddings for the `image_description` and `image_binary` fields. In addition to the original `image_description` and `image_binary` fields, the indexed document includes the `vector_embedding` field, which contains the combined vector embeddings. -## Step 4: Search the index using neural search +## Step 4: Search the index -To perform vector search on your index, use the `neural` query clause either in the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-for-a-model) or [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. You can refine the results by using a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). You can search by text, image, or both text and image. +To perform vector search on your index, use the `neural` query clause either in the [Search for a Model API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-for-a-model) or [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. You can refine the results by using a [vector search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). You can search by text, image, or both text and image. The following example request uses a neural query to search for text and image: @@ -130,4 +132,8 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} -To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field. To learn more, see [Setting a default model on an index or field]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/##setting-a-default-model-on-an-index-or-field). +To eliminate passing the model ID with each neural query request, you can set a default model on a vector index or a field. To learn more, see [Setting a default model on an index or field]({{site.url}}{{site.baseurl}}/search-plugins/neural-text-search/##setting-a-default-model-on-an-index-or-field). + +## Next steps + +- Explore our [tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/) to learn how to build AI search applications. \ No newline at end of file diff --git a/_search-plugins/neural-sparse-search.md b/_vector-search/ai-search/neural-sparse-search.md similarity index 73% rename from _search-plugins/neural-sparse-search.md rename to _vector-search/ai-search/neural-sparse-search.md index 0beee26ef0..91c55ee858 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_vector-search/ai-search/neural-sparse-search.md @@ -1,11 +1,13 @@ --- layout: default title: Neural sparse search +parent: AI search nav_order: 50 has_children: true redirect_from: - /search-plugins/neural-sparse-search/ - /search-plugins/sparse-search/ + - /search-plugins/neural-sparse-search/ --- # Neural sparse search @@ -18,10 +20,10 @@ To further boost search relevance, you can combine neural sparse search with den You can configure neural sparse search in the following ways: -- Generate vector embeddings within OpenSearch: Configure an ingest pipeline to generate and store sparse vector embeddings from document text at ingestion time. At query time, input plain text, which will be automatically converted into vector embeddings for search. For complete setup steps, see [Configuring ingest pipelines for neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). -- Ingest raw sparse vectors and search using sparse vectors directly. For complete setup steps, see [Ingesting and searching raw vectors]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-raw-vectors/). +- Generate vector embeddings automatically: Configure an ingest pipeline to generate and store sparse vector embeddings from document text at ingestion time. At query time, input plain text, which will be automatically converted into vector embeddings for search. For complete setup steps, see [Generating sparse vector embeddings automatically]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). +- Ingest raw sparse vectors and search using sparse vectors directly. For complete setup steps, see [Neural sparse search using raw vectors]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-raw-vectors/). -To learn more about splitting long text into passages for neural search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). +To learn more about splitting long text into passages for neural sparse search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). ## Accelerating neural sparse search @@ -56,7 +58,12 @@ PUT /my-nlp-index/_settings For information about `two_phase_search_pipeline`, see [Neural sparse query two-phase processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/). +## Text chunking + +For information about splitting large documents into smaller passages before generating embeddings, see [Text chunking]({{site.url}}{{site.baseurl}}/vector-search/ingesting-data/text-chunking/). + ## Further reading - Learn more about how sparse encoding models work and explore OpenSearch neural sparse search benchmarks in [Improving document retrieval with sparse semantic encoders](https://opensearch.org/blog/improving-document-retrieval-with-sparse-semantic-encoders/). - Learn the fundamentals of neural sparse search and its efficiency in [A deep dive into faster semantic sparse retrieval in OpenSearch 2.12](https://opensearch.org/blog/A-deep-dive-into-faster-semantic-sparse-retrieval-in-OS-2.12/). +- Explore our [tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/) to learn how to build AI search applications. diff --git a/_search-plugins/neural-sparse-with-pipelines.md b/_vector-search/ai-search/neural-sparse-with-pipelines.md similarity index 96% rename from _search-plugins/neural-sparse-with-pipelines.md rename to _vector-search/ai-search/neural-sparse-with-pipelines.md index 2e8f01a446..0a6f4db6db 100644 --- a/_search-plugins/neural-sparse-with-pipelines.md +++ b/_vector-search/ai-search/neural-sparse-with-pipelines.md @@ -1,14 +1,17 @@ --- layout: default -title: Configuring ingest pipelines +title: Generating sparse vector embeddings automatically parent: Neural sparse search +grand_parent: AI search nav_order: 10 has_children: false +redirect_from: + - /search-plugins/neural-sparse-with-pipelines/ --- -# Configuring ingest pipelines for neural sparse search +# Generating sparse vector embeddings automatically -Generating sparse vector embeddings within OpenSearch enables neural sparse search to function like lexical search. To take advantage of this encapsulation, set up an ingest pipeline to create and store sparse vector embeddings from document text during ingestion. At query time, input plain text, which will be automatically converted into vector embeddings for search. +Generating sparse vector embeddings automatically enables neural sparse search to function like lexical search. To take advantage of this encapsulation, set up an ingest pipeline to create and store sparse vector embeddings from document text during ingestion. At query time, input plain text, which will be automatically converted into vector embeddings for search. For this tutorial, you'll use neural sparse search with OpenSearch's built-in machine learning (ML) model hosting and ingest pipelines. Because the transformation of text to embeddings is performed within OpenSearch, you'll use text when ingesting and searching documents. @@ -38,7 +41,7 @@ This tutorial consists of the following steps: ### Prerequisites -Before you start, complete the [prerequisites]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/#prerequisites) for neural search. +Before you start, complete the [prerequisites]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/#prerequisites). ## Step 1: Configure a sparse encoding model/tokenizer @@ -510,4 +513,8 @@ For OpenSearch versions earlier than 2.15, a throttling exception will be return } ``` -To mitigate throttling exceptions, decrease the maximum number of connections specified in the `max_connection` setting in the connector's [`client_config`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/#configuration-parameters) object. Doing so will prevent the maximum number of concurrent connections from exceeding the threshold of the remote service. You can also modify the retry settings to avoid a request spike during ingestion. \ No newline at end of file +To mitigate throttling exceptions, decrease the maximum number of connections specified in the `max_connection` setting in the connector's [`client_config`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/#configuration-parameters) object. Doing so will prevent the maximum number of concurrent connections from exceeding the threshold of the remote service. You can also modify the retry settings to avoid a request spike during ingestion. + +## Next steps + +- Explore our [tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/) to learn how to build AI search applications. \ No newline at end of file diff --git a/_search-plugins/neural-sparse-with-raw-vectors.md b/_vector-search/ai-search/neural-sparse-with-raw-vectors.md similarity index 87% rename from _search-plugins/neural-sparse-with-raw-vectors.md rename to _vector-search/ai-search/neural-sparse-with-raw-vectors.md index d69a789a1d..3cb0f87268 100644 --- a/_search-plugins/neural-sparse-with-raw-vectors.md +++ b/_vector-search/ai-search/neural-sparse-with-raw-vectors.md @@ -1,12 +1,15 @@ --- layout: default -title: Using raw vectors +title: Neural sparse search using raw vectors parent: Neural sparse search +grand_parent: AI search nav_order: 20 has_children: false +redirect_from: + - /search-plugins/neural-sparse-with-raw-vectors/ --- -# Using raw vectors for neural sparse search +# Neural sparse search using raw vectors If you're using self-hosted sparse embedding models, you can ingest raw sparse vectors and use neural sparse search. @@ -97,3 +100,7 @@ GET my-nlp-index/_search ## Accelerating neural sparse search To learn more about improving retrieval time for neural sparse search, see [Accelerating neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#accelerating-neural-sparse-search). + +## Next steps + +- Explore our [tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/) to learn how to build AI search applications. diff --git a/_search-plugins/semantic-search.md b/_vector-search/ai-search/semantic-search.md similarity index 83% rename from _search-plugins/semantic-search.md rename to _vector-search/ai-search/semantic-search.md index 259685fe3d..4abf858e3c 100644 --- a/_search-plugins/semantic-search.md +++ b/_vector-search/ai-search/semantic-search.md @@ -1,15 +1,17 @@ --- layout: default title: Semantic search +parent: AI search nav_order: 35 has_children: false redirect_from: - /search-plugins/neural-text-search/ + - /search-plugins/semantic-search/ --- # Semantic search -Semantic search considers the context and intent of a query. In OpenSearch, semantic search is facilitated by neural search with text embedding models. Semantic search creates a dense vector (a list of floats) and ingests data into a k-NN index. +Semantic search considers the context and intent of a query. In OpenSearch, semantic search is facilitated by text embedding models. Semantic search creates a dense vector (a list of floats) and ingests data into a vector index. **PREREQUISITE**
Before using semantic search, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). @@ -22,7 +24,7 @@ To use semantic search, follow these steps: 1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline). 1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). 1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). -1. [Search the index using neural search](#step-4-search-the-index-using-neural-search). +1. [Search the index](#step-4-search-the-index). ## Step 1: Create an ingest pipeline @@ -52,9 +54,9 @@ To split long text into passages, use the `text_chunking` ingest processor befor ## Step 2: Create an index for ingestion -In order to use the text embedding processor defined in your pipeline, create a k-NN index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`. +In order to use the text embedding processor defined in your pipeline, create a vector index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`. -The following example request creates a k-NN index that is set up with a default ingest pipeline: +The following example request creates a vector index that is set up with a default ingest pipeline: ```json PUT /my-nlp-index @@ -87,7 +89,7 @@ PUT /my-nlp-index ``` {% include copy-curl.html %} -For more information about creating a k-NN index and its supported methods, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). +For more information about creating a vector index and its supported methods, see [Creating a vector index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). ## Step 3: Ingest documents into the index @@ -113,9 +115,9 @@ PUT /my-nlp-index/_doc/2 Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. -## Step 4: Search the index using neural search +## Step 4: Search the index -To perform vector search on your index, use the `neural` query clause either in the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-for-a-model) or [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. You can refine the results by using a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). +To perform vector search on your index, use the `neural` query clause either in the [Search for a Model API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-for-a-model) or [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. You can refine the results by using a [vector search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). The following example request uses a Boolean query to combine a filter clause and two query clauses---a neural query and a `match` query. The `script_score` query assigns custom weights to the query clauses: @@ -203,7 +205,7 @@ The response contains the matching document: ## Setting a default model on an index or field -A [`neural`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/) query requires a model ID for generating vector embeddings. To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field. +A [`neural`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/) query requires a model ID for generating vector embeddings. To eliminate passing the model ID with each neural query request, you can set a default model on a vector index or a field. First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model for an index, provide the model ID in the `default_model_id` parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence: @@ -297,4 +299,8 @@ The response contains both documents: ] } } -``` \ No newline at end of file +``` + +## Next steps + +- Explore our [semantic search tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/semantic-search/) to learn how to build AI search applications. \ No newline at end of file diff --git a/_search-plugins/knn/api.md b/_vector-search/api.md similarity index 83% rename from _search-plugins/knn/api.md rename to _vector-search/api.md index 83d025c86d..3b87f35bcc 100644 --- a/_search-plugins/knn/api.md +++ b/_vector-search/api.md @@ -1,24 +1,30 @@ --- layout: default -title: k-NN plugin API -nav_order: 30 -parent: k-NN search +title: API +nav_order: 80 has_children: false +redirect_from: + - /search-plugins/knn/api/ --- -# k-NN plugin API +# k-NN API -The k-NN plugin adds several APIs for managing, monitoring, and optimizing your k-NN workload. +OpenSearch provides several k-nearest neighbors (k-NN) APIs for managing, monitoring, and optimizing your vector workload. ## Stats -The k-NN `stats` API provides information about the current status of the k-NN plugin. The plugin keeps track of both cluster-level and node-level statistics. Cluster-level statistics have a single value for the entire cluster. Node-level statistics have a single value for each node in the cluster. You can filter the query by `nodeId` and `statName`, as shown in the following example: +The k-NN `stats` API provides information about the current status of the k-NN plugin, which implements vector search functionality. This includes both cluster-level and node-level statistics. Cluster-level statistics have a single value for the entire cluster. Node-level statistics have a single value for each node in the cluster. You can filter the query by `nodeId` and `statName`, as shown in the following example: -``` +```json GET /_plugins/_knn/nodeId1,nodeId2/stats/statName1,statName2 ``` +{% include copy-curl.html %} + +### Response body fields + +The following table lists the available response body fields. -Statistic | Description +Field | Description :--- | :--- `circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search. `total_load_time` | The time in nanoseconds that k-NN has taken to load native library indexes into the cache. This statistic is only relevant to approximate k-NN search. @@ -36,13 +42,13 @@ Statistic | Description `load_success_count` | The number of times k-NN successfully loaded a native library index into the cache. This statistic is only relevant to approximate k-NN search. `load_exception_count` | The number of times an exception occurred when trying to load a native library index into the cache. This statistic is only relevant to approximate k-NN search. `indices_in_cache` | For each OpenSearch index with a `knn_vector` field and approximate k-NN turned on, this statistic provides the number of native library indexes that OpenSearch index has and the total `graph_memory_usage` that the OpenSearch index is using, in kilobytes. -`script_compilations` | The number of times the k-NN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the k-NN script might be recompiled. This statistic is only relevant to k-NN score script search. -`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search. -`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search. -`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search. -`nmslib_initialized` | Boolean value indicating whether the *nmslib* JNI library has been loaded and initialized on the node. -`faiss_initialized` | Boolean value indicating whether the *faiss* JNI library has been loaded and initialized on the node. -`model_index_status` | Status of model system index. Valid values are "red", "yellow", "green". If the index does not exist, this will be null. +`script_compilations` | The number of times the k-NN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the k-NN script might be recompiled. This statistic is only relevant to k-NN scoring script search. +`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN scoring script search. +`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN scoring script search. +`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN scoring script search. +`nmslib_initialized` | A Boolean value indicating whether the `nmslib` JNI library has been loaded and initialized on the node. +`faiss_initialized` | A Boolean value indicating whether the `faiss` JNI library has been loaded and initialized on the node. +`model_index_status` | The status of the model system index. Valid values are `red`, `yellow`, and `green`. If the index does not exist, this value is `null`. `indexing_from_model_degraded` | Boolean value indicating if indexing from a model is degraded. This happens if there is not enough JVM memory to cache the models. `ing_requests` | The number of training requests made to the node. `training_errors` | The number of training errors that have occurred on the node. @@ -52,9 +58,11 @@ Statistic | Description Some statistics contain *graph* in the name. In these cases, *graph* is synonymous with *native library index*. The term *graph* is reflective of when the plugin only supported the HNSW algorithm, which consists of hierarchical graphs. {: .note} -#### Usage +#### Example request -The following code examples show how to retrieve statistics related to the k-NN plugin. The first example fetches comprehensive statistics for the k-NN plugin across all nodes in the cluster, while the second example retrieves specific metrics (circuit breaker status and graph memory usage) for a single node. +The following examples demonstrate how to retrieve statistics related to the k-NN plugin. + +The following example fetches comprehensive statistics for the k-NN plugin across all nodes in the cluster: ```json GET /_plugins/_knn/stats?pretty @@ -105,6 +113,9 @@ GET /_plugins/_knn/stats?pretty } } ``` +{% include copy-curl.html %} + +The following example retrieves specific metrics (circuit breaker status and graph memory usage) for a single node: ```json GET /_plugins/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,graph_memory_usage?pretty @@ -123,6 +134,7 @@ GET /_plugins/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,graph_ } } ``` +{% include copy-curl.html %} ## Warmup operation @@ -134,9 +146,9 @@ As an alternative, you can avoid this latency issue by running the k-NN plugin w After the process is finished, you can search against the indexes without initial latency penalties. The warmup API operation is idempotent, so if a segment's native library files are already loaded into memory, this operation has no effect. It only loads files not currently stored in memory. -#### Usage +#### Example request -This request performs a warmup on three indexes: +The following request performs a warmup on three indexes: ```json GET /_plugins/_knn/warmup/index1,index2,index3?pretty @@ -148,14 +160,16 @@ GET /_plugins/_knn/warmup/index1,index2,index3?pretty } } ``` +{% include copy-curl.html %} -`total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. +The `total` value indicates the number of shards that the k-NN plugin attempted to warm up. The response also includes the number of shards that the plugin successfully warmed up and failed to warm up. The call does not return results until the warmup operation finishes or the request times out. If the request times out, then the operation continues on the cluster. To monitor the warmup operation, use the OpenSearch `_tasks` API: ```json GET /_tasks ``` +{% include copy-curl.html %} After the operation has finished, use the [k-NN `_stats` API operation](#stats) to see what the k-NN plugin loaded into the graph. @@ -180,7 +194,7 @@ The k-NN clear cache API evicts all native library files for all shards (primari This API operation only works with indexes created using the `faiss` and `nmslib` (deprecated) engines. It has no effect on indexes created using the `lucene` engine. {: .note} -#### Usage +#### Example request The following request evicts the native library indexes of three indexes from the cache: @@ -194,6 +208,7 @@ POST /_plugins/_knn/clear_cache/index1,index2,index3?pretty } } ``` +{% include copy-curl.html %} The `total` parameter indicates the number of shards that the API attempted to clear from the cache. The response includes both the number of cleared shards and the number of shards that the plugin failed to clear. @@ -209,22 +224,31 @@ POST /_plugins/_knn/clear_cache/index*?pretty } } ``` +{% include copy-curl.html %} The API call does not return results until the operation finishes or the request times out. If the request times out, then the operation continues on the cluster. To monitor the request, use the `_tasks` API, as shown in the following example: ```json GET /_tasks ``` +{% include copy-curl.html %} When the operation finishes, use the [k-NN `_stats` API operation](#stats) to see which indexes have been evicted from the cache. ## Get a model -The GET model operation retrieves information about models present in the cluster. Some native library index configurations require a training step before indexing and querying can begin. The output of training is a model that can be used to initialize native library index files during indexing. The model is serialized in the k-NN model system index. See the following GET example: +The GET model operation retrieves information about models present in the cluster. Some native library index configurations require a training step before indexing and querying can begin. The output of training is a model that can be used to initialize native library index files during indexing. The model is serialized in the k-NN model system index. -``` +#### Example request + +```json GET /_plugins/_knn/models/{model_id} ``` +{% include copy-curl.html %} + +### Response body fields + +The following table lists the available response body fields. Response field | Description :--- | :--- @@ -234,13 +258,15 @@ Response field | Description `timestamp` | The date and time when the model was created. `description` | A user-provided description of the model. `error` | An error message explaining why the model is in a failed state. -`space_type` | The space type for which this model is trained, for example, Euclidean or cosine. Note - this value can be set in the top-level of the request as well +`space_type` | The space type for which the model is trained, for example, Euclidean or cosine. Note: This value can be set at the top level of the request. `dimension` | The dimensionality of the vector space for which this model is designed. `engine` | The native library used to create the model, either `faiss` or `nmslib` (deprecated). -### Usage +#### Example request -The following examples show how to retrieve information about a specific model using the k-NN plugin API. The first example returns all the available information about the model, while the second example shows how to selectively retrieve fields. +The following examples demonstrate how to retrieve information about a specific model using the k-NN plugin API. + +The following example returns all the available information about the model: ```json GET /_plugins/_knn/models/test-model?pretty @@ -256,6 +282,9 @@ GET /_plugins/_knn/models/test-model?pretty "engine" : "faiss" } ``` +{% include copy-curl.html %} + +The following example demonstrates how to selectively retrieve fields: ```json GET /_plugins/_knn/models/test-model?pretty&filter_path=model_id,state @@ -264,12 +293,13 @@ GET /_plugins/_knn/models/test-model?pretty&filter_path=model_id,state "state" : "created" } ``` +{% include copy-curl.html %} ## Search for a model You can use an OpenSearch query to search for a model in the index. See the following usage example. -#### Usage +#### Example request The following example shows how to search for k-NN models in an OpenSearch cluster and how to retrieve the metadata for those models, excluding the potentially large `model_blob` field: @@ -280,7 +310,12 @@ GET/POST /_plugins/_knn/models/_search?pretty&_source_excludes=model_blob ... } } +``` +{% include copy-curl.html %} +The response contains the model information: + +```json { "took" : 0, "timed_out" : false, @@ -321,7 +356,7 @@ GET/POST /_plugins/_knn/models/_search?pretty&_source_excludes=model_blob You can delete a model in the cluster by using the DELETE operation. See the following usage example. -#### Usage +#### Example request The following example shows how to delete a k-NN model: @@ -332,17 +367,26 @@ DELETE /_plugins/_knn/models/{model_id} "acknowledged": true } ``` +{% include copy-curl.html %} ## Train a model You can create and train a model that can be used for initializing k-NN native library indexes during indexing. This API pulls training data from a `knn_vector` field in a training index, creates and trains a model, and then serializes it to the model system index. Training data must match the dimension passed in the request body. This request is returned when training begins. To monitor the model's state, use the [Get model API](#get-a-model). +### Query parameters + +The following table lists the available query parameters. + Query parameter | Description :--- | :--- `model_id` | The unique identifier of the fetched model. If not specified, then a random ID is generated. Optional. `node_id` | Specifies the preferred node on which to execute the training process. If provided, the specified node is used for training if it has the necessary capabilities and resources available. Optional. -Request parameter | Description +### Request body fields + +The following table lists the available request body fields. + +Request field | Description :--- | :--- `training_index` | The index from which the training data is retrieved. `training_field` | The `knn_vector` field in the `training_index` from which the training data is retrieved. The dimension of this field must match the `dimension` passed in this request. @@ -350,10 +394,10 @@ Request parameter | Description `max_training_vector_count` | The maximum number of vectors from the training index to be used for training. Defaults to all the vectors in the index. Optional. `search_size` | The training data is pulled from the training index using scroll queries. This parameter defines the number of results to return per scroll query. Default is `10000`. Optional. `description` | A user-provided description of the model. Optional. -`method` | The configuration of the approximate k-NN method used for search operations. For more information about the available methods, see [k-NN index method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions). The method requires training to be valid. +`method` | The configuration of the approximate k-NN method used for search operations. For more information about the available methods, see [Methods and engines]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/). The method requires training in order to be valid. `space_type` | The space type for which this model is trained, for example, Euclidean or cosine. Note: This value can also be set in the `method` parameter. -#### Usage +#### Example request The following examples show how to initiate the training process for a k-NN model: @@ -381,11 +425,9 @@ POST /_plugins/_knn/models/{model_id}/_train?preference={node_id} } } } - -{ - "model_id": "model_x" -} ``` +{% include copy-curl.html %} + ```json POST /_plugins/_knn/models/_train?preference={node_id} @@ -411,7 +453,12 @@ POST /_plugins/_knn/models/_train?preference={node_id} } } } +``` +{% include copy-curl.html %} + +#### Example response +```json { "model_id": "dcdwscddscsad" } diff --git a/_vector-search/creating-vector-index.md b/_vector-search/creating-vector-index.md new file mode 100644 index 0000000000..38af370bce --- /dev/null +++ b/_vector-search/creating-vector-index.md @@ -0,0 +1,160 @@ +--- +layout: default +title: Creating a vector index +nav_order: 20 +redirect_from: + - /vector-search/creating-a-vector-db/ + - /search-plugins/knn/knn-index/ + - /vector-search/creating-vector-index/ +--- + +# Creating a vector index + +Creating a vector index in OpenSearch involves a common core process with some variations depending on the type of vector search. This guide outlines the key elements shared across all vector indexes and the differences specific to supported use cases. + +Before you start, review the options for generating embeddings to help you decide on the option suitable for your use case. For more information, see [Preparing vectors]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-options/). +{: .tip} + +## Overview + +To create a vector index, set the `index.knn` parameter to `true`in the `settings`: + +```json +PUT /test-index +{ + "settings": { + "index.knn": true + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2", + "mode": "on_disk", + "method": { + "name": "hnsw" + } + } + } + } +} +``` +{% include copy-curl.html %} + + +Creating a vector index involves the following key steps: + +1. **Enable k-nearest neighbors (k-NN) search**: + Set `index.knn` to `true` in the index settings to enable k-NN search functionality. + +1. **Define a vector field**: + Specify the field that will store the vector data. When defining a `knn_vector` field in OpenSearch, you can select from different data types to balance storage requirements and performance. By default, k-NN vectors are float vectors, but you can also choose byte or binary vectors for more efficient storage. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). + +1. **Specify the dimension**: + Set the `dimension` property to match the size of the vectors used. + +1. (Optional) **Choose a space type**: + Select a distance metric for similarity comparisons, such as `l2` (Euclidean distance) or `cosinesimil`. For more information, see [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/). + +1. (Optional) **Select a workload mode and/or compression level**: + Select a workload mode and/or compression level in order to optimize vector storage. For more information, see [Optimizing vector storage]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/). + +1. (Optional, advanced) **Select a method**: + Configure the indexing method, such as HNSW or IVF, used to optimize vector search performance. For more information, see [Methods and engines]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/). + +## Implementation options + +Based on your vector generation approach, choose one of the following implementation options: + +- [Store raw vectors or embeddings generated outside of OpenSearch](#storing-raw-vectors-or-embeddings-generated-outside-of-opensearch): Ingest pregenerated embeddings or raw vectors into your index for raw vector search. +- [Convert data to embeddings during ingestion](#converting-data-to-embeddings-during-ingestion): Ingest text that will be converted into vector embeddings in OpenSearch in order to perform semantic search using machine learning (ML) models. + +The following table summarizes key index configuration differences for the supported use cases. + +| Feature | Vector field type | Ingest pipeline | Transformation | Use case | +|--------------------------|-----------------------|---------------------|-------------------------|-------------------------| +| **Store raw vectors or embeddings generated outside of OpenSearch** | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) | Not required | Direct ingestion | Raw vector search | +| **Convert data to embeddings during ingestion** | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) | Required | Auto-generated vectors | AI search

Automating embedding generation reduces data preprocessing and provides a more managed vector search experience. | + +## Storing raw vectors or embeddings generated outside of OpenSearch + +To ingest raw vectors into an index, configure a vector field (in this request, `my_vector`) and specify its `dimension`: + +```json +PUT /my-raw-vector-index +{ + "settings": { + "index.knn": true + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 3 + } + } + } +} +``` +{% include copy-curl.html %} + +## Converting data to embeddings during ingestion + +To automatically generate embeddings during ingestion, configure an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) with the model ID of the embedding model. For more information about configuring a model, see [Integrating ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/). + +Specify the `field_map` to define the source field for input text and the target field for storing embeddings. In this example, text from the `text` field is converted into embeddings and stored in `passage_embedding`: + +```json +PUT /_ingest/pipeline/auto-embed-pipeline +{ + "description": "AI search ingest pipeline that automatically converts text to embeddings", + "processors": [ + { + "text_embedding": { + "model_id": "mBGzipQB2gmRjlv_dOoB", + "field_map": { + "input_text": "output_embedding" + } + } + } + ] +} +``` +{% include copy-curl.html %} + +For more information, see [Text embedding processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/text-embedding/). + +When creating an index, specify the pipeline as the `default_pipeline`. Ensure that `dimension` matches the dimensionality of the model configured in the pipeline: + +```json +PUT /my-ai-search-index +{ + "settings": { + "index.knn": true, + "default_pipeline": "auto-embed-pipeline" + }, + "mappings": { + "properties": { + "input_text": { + "type": "text" + }, + "output_embedding": { + "type": "knn_vector", + "dimension": 768 + } + } + } +} +``` +{% include copy-curl.html %} + +## Working with sparse vectors + +OpenSearch also supports sparse vectors. For more information, see [Neural sparse search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/neural-sparse-search/). + +## Next steps + +- [Ingesting data into a vector index]({{site.url}}{{site.baseurl}}/vector-search/searching-data/) +- [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) +- [Methods and engines]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) \ No newline at end of file diff --git a/_search-plugins/knn/filter-search-knn.md b/_vector-search/filter-search-knn/efficient-knn-filtering.md similarity index 59% rename from _search-plugins/knn/filter-search-knn.md rename to _vector-search/filter-search-knn/efficient-knn-filtering.md index b413960ee7..cd2f9e6e2d 100644 --- a/_search-plugins/knn/filter-search-knn.md +++ b/_vector-search/filter-search-knn/efficient-knn-filtering.md @@ -1,61 +1,17 @@ --- layout: default -title: k-NN search with filters -nav_order: 20 -parent: k-NN search -has_children: false -has_math: true +title: Efficient k-NN filtering +parent: Filtering data +nav_order: 10 --- -# k-NN search with filters - -To refine k-NN results, you can filter a k-NN search using one of the following methods: - -- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned (if there are at least `k` results in total). This approach is supported by the following engines: - - Lucene engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later) - - Faiss engine with an HNSW algorithm (k-NN plugin versions 2.9 and later) or IVF algorithm (k-NN plugin versions 2.10 and later) - -- [Post-filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter. You can use the following two filtering strategies for this approach: - - [Boolean post-filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently, and then the results are combined based on the query operator (`should`, `must`, and so on) provided in the query. - - [The `post_filter` parameter](#post-filter-parameter): This approach runs an [ANN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search on the full dataset and then applies the filter to the k-NN results. - -- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It may have high latency and does not scale when filtered subsets are large. - -The following table summarizes the preceding filtering use cases. - -Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause -:--- | :--- | :--- | :--- -Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`)
- `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause. -Boolean filter | After search (post-filtering) | Approximate | - `lucene`
- `faiss`
- `nmslib` (deprecated) | Outside the k-NN query clause. Must be a leaf clause. -The `post_filter` parameter | After search (post-filtering) | Approximate | - `lucene`
- `nmslib` (deprecated)
- `faiss` | Outside the k-NN query clause. -Scoring script filter | Before search (pre-filtering) | Exact | N/A | Inside the script score query clause. - -## Filtered search optimization - -Depending on your dataset and use case, you might be more interested in maximizing recall or minimizing latency. The following table provides guidance on various k-NN search configurations and the filtering methods used to optimize for higher recall or lower latency. The first three columns of the table provide several example k-NN search configurations. A search configuration consists of: - -- The number of documents in an index, where one OpenSearch document corresponds to one k-NN vector. -- The percentage of documents left in the results after filtering. This value depends on the restrictiveness of the filter that you provide in the query. The most restrictive filter in the table returns 2.5% of documents in the index, while the least restrictive filter returns 80% of documents. -- The desired number of returned results (k). - -Once you've estimated the number of documents in your index, the restrictiveness of your filter, and the desired number of nearest neighbors, use the following table to choose a filtering method that optimizes for recall or latency. - -| Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for higher recall | Filtering method to use for lower latency | -| :-- | :-- | :-- | :-- | :-- | -| 10M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script | -| 10M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | -| 10M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | -| 1M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script | -| 1M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | -| 1M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | - -## Efficient k-NN filtering +# Efficient k-NN filtering You can perform efficient k-NN filtering with the `lucene` or `faiss` engines. -### Lucene k-NN filter implementation +## Lucene k-NN filter implementation -k-NN plugin version 2.2 introduced support for running k-NN searches with the Lucene engine using HNSW graphs. Starting with version 2.4, which is based on Lucene version 9.4, you can use Lucene filters for k-NN searches. +OpenSearch version 2.2 introduced support for running k-NN searches with the Lucene engine using HNSW graphs. Starting with version 2.4, which is based on Lucene version 9.4, you can use Lucene filters for k-NN searches. When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. The algorithm uses the following variables: @@ -69,7 +25,7 @@ The following flow chart outlines the Lucene algorithm. For more information about the Lucene filtering implementation and the underlying `KnnVectorQuery`, see the [Apache Lucene documentation](https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/KnnVectorQuery.html). -### Using a Lucene k-NN filter +## Using a Lucene k-NN filter Consider a dataset that includes 12 documents containing hotel information. The following image shows all hotels on an xy coordinate plane by location. Additionally, the points for hotels that have a rating between 8 and 10, inclusive, are depicted with orange dots, and hotels that provide parking are depicted with green circles. The search point is colored in red: @@ -77,7 +33,7 @@ Consider a dataset that includes 12 documents containing hotel information. The In this example, you will create an index and search for the three hotels with high ratings and parking that are the closest to the search location. -**Step 1: Create a new index** +### Step 1: Create a new index Before you can run a k-NN search with a filter, you need to create an index with a `knn_vector` field. For this field, you need to specify `lucene` as the engine and `hnsw` as the `method` in the mapping. @@ -115,7 +71,7 @@ PUT /hotels-index ``` {% include copy-curl.html %} -**Step 2: Add data to your index** +### Step 2: Add data to your index Next, add data to your index. @@ -150,7 +106,7 @@ POST /_bulk ``` {% include copy-curl.html %} -**Step 3: Search your data with a filter** +### Step 3: Search your data with a filter Now you can create a k-NN search with filters. In the k-NN query clause, include the point of interest that is used to search for nearest neighbors, the number of nearest neighbors to return (`k`), and a filter with the restriction criteria. Depending on how restrictive you want your filter to be, you can add multiple query clauses to a single request. @@ -259,9 +215,9 @@ The response returns the three hotels that are nearest to the search point and h For more ways to construct a filter, see [Constructing a filter](#constructing-a-filter). -### Faiss k-NN filter implementation +## Faiss k-NN filter implementation -For k-NN searches, you can use `faiss` filters with an HNSW algorithm (k-NN plugin versions 2.9 and later) or IVF algorithm (k-NN plugin versions 2.10 and later). +For k-NN searches, you can use `faiss` filters with an HNSW algorithm (OpenSearch version 2.9 and later) or IVF algorithm (OpenSearch version 2.10 and later). When you specify a Faiss filter for a k-NN search, the Faiss algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. The algorithm uses the following variables: @@ -276,13 +232,13 @@ The following flow chart outlines the Faiss algorithm. ![Faiss algorithm for filtering]({{site.url}}{{site.baseurl}}/images/faiss-algorithm.jpg) -### Using a Faiss efficient filter +## Using a Faiss efficient filter Consider an index that contains information about different shirts for an e-commerce application. You want to find the top-rated shirts that are similar to the one you already have but would like to restrict the results by shirt size. In this example, you will create an index and search for shirts that are similar to the shirt you provide. -**Step 1: Create a new index** +### Step 1: Create a new index Before you can run a k-NN search with a filter, you need to create an index with a `knn_vector` field. For this field, you need to specify `faiss` and `hnsw` as the `method` in the mapping. @@ -313,7 +269,7 @@ PUT /products-shirts ``` {% include copy-curl.html %} -**Step 2: Add data to your index** +### Step 2: Add data to your index Next, add data to your index. @@ -349,7 +305,7 @@ POST /_bulk?refresh ``` {% include copy-curl.html %} -**Step 3: Search your data with a filter** +### Step 3: Search your data with a filter Now you can create a k-NN search with filters. In the k-NN query clause, include the vector representation of the shirt that is used to search for similar ones, the number of nearest neighbors to return (`k`), and a filter by size and rating. @@ -446,7 +402,7 @@ The response returns the two matching documents: For more ways to construct a filter, see [Constructing a filter](#constructing-a-filter). -### Constructing a filter +## Constructing a filter There are multiple ways to construct a filter for the same condition. For example, you can use the following constructs to create a filter that returns hotels that provide parking: @@ -511,195 +467,3 @@ POST /hotels-index/_search } ``` {% include copy-curl.html %} - -## Post-filtering - -You can achieve post-filtering with a Boolean filter or by providing the `post_filter` parameter. - -### Boolean filter with ANN search - -A Boolean filter consists of a Boolean query that contains a k-NN query and a filter. For example, the following query searches for hotels that are closest to the specified `location` and then filters the results to return hotels with a rating between 8 and 10, inclusive, that provide parking: - -```json -POST /hotels-index/_search -{ - "size": 3, - "query": { - "bool": { - "filter": { - "bool": { - "must": [ - { - "range": { - "rating": { - "gte": 8, - "lte": 10 - } - } - }, - { - "term": { - "parking": "true" - } - } - ] - } - }, - "must": [ - { - "knn": { - "location": { - "vector": [ - 5, - 4 - ], - "k": 20 - } - } - } - ] - } - } -} -``` - -The response includes documents containing the matching hotels: - -```json -{ - "took" : 95, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 5, - "relation" : "eq" - }, - "max_score" : 0.72992706, - "hits" : [ - { - "_index" : "hotels-index", - "_id" : "3", - "_score" : 0.72992706, - "_source" : { - "location" : [ - 4.9, - 3.4 - ], - "parking" : "true", - "rating" : 9 - } - }, - { - "_index" : "hotels-index", - "_id" : "6", - "_score" : 0.3012048, - "_source" : { - "location" : [ - 6.4, - 3.4 - ], - "parking" : "true", - "rating" : 9 - } - }, - { - "_index" : "hotels-index", - "_id" : "5", - "_score" : 0.24154587, - "_source" : { - "location" : [ - 3.3, - 4.5 - ], - "parking" : "true", - "rating" : 8 - } - } - ] - } -} -``` - -### post-filter parameter - -If you use the `knn` query alongside filters or other clauses (for example, `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1: - -```json -GET my-knn-index-1/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector2": { - "vector": [2, 3, 5, 6], - "k": 2 - } - } - }, - "post_filter": { - "range": { - "price": { - "gte": 5, - "lte": 10 - } - } - } -} -``` - -## Scoring script filter - -A scoring script filter first filters the documents and then uses a brute-force exact k-NN search on the results. For example, the following query searches for hotels with a rating between 8 and 10, inclusive, that provide parking and then performs a k-NN search to return the 3 hotels that are closest to the specified `location`: - -```json -POST /hotels-index/_search -{ - "size": 3, - "query": { - "script_score": { - "query": { - "bool": { - "filter": { - "bool": { - "must": [ - { - "range": { - "rating": { - "gte": 8, - "lte": 10 - } - } - }, - { - "term": { - "parking": "true" - } - } - ] - } - } - } - }, - "script": { - "source": "knn_score", - "lang": "knn", - "params": { - "field": "location", - "query_value": [ - 5.0, - 4.0 - ], - "space_type": "l2" - } - } - } - } -} -``` -{% include copy-curl.html %} diff --git a/_vector-search/filter-search-knn/index.md b/_vector-search/filter-search-knn/index.md new file mode 100644 index 0000000000..afa018c2d8 --- /dev/null +++ b/_vector-search/filter-search-knn/index.md @@ -0,0 +1,51 @@ +--- +layout: default +title: Filtering data +nav_order: 50 +has_children: true +redirect_from: + - /search-plugins/knn/filter-search-knn/ + - /vector-search/filter-search-knn/ +--- + +# Filtering data + +To refine vector search results, you can filter a vector search using one of the following methods: + +- [Efficient k-nearest neighbors (k-NN) filtering]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/efficient-knn-filtering/): This approach applies filtering _during_ the vector search, as opposed to before or after the vector search, which ensures that `k` results are returned (if there are at least `k` results in total). This approach is supported by the following engines: + - Lucene engine with a Hierarchical Navigable Small World (HNSW) algorithm (OpenSearch version 2.4 and later) + - Faiss engine with an HNSW algorithm (OpenSearch version 2.9 and later) or IVF algorithm (OpenSearch version 2.10 and later) + +- [Post-filtering]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/post-filtering/): Because it is performed after the vector search, this approach may return significantly fewer than `k` results for a restrictive filter. You can use the following two filtering strategies for this approach: + - [Boolean post-filter]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/post-filtering/#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently, and then the results are combined based on the query operator (`should`, `must`, and so on) provided in the query. + - [The `post_filter` parameter]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/post-filtering/#the-post_filter-parameter): This approach runs an [ANN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search on the full dataset and then applies the filter to the k-NN results. + +- [Scoring script filter]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/scoring-script-filter/): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It may have high latency and does not scale when filtered subsets are large. + +The following table summarizes the preceding filtering use cases. + +Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause +:--- | :--- | :--- | :--- +Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`)
- `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause. +Boolean filter | After search (post-filtering) | Approximate | - `lucene`
- `faiss`
- `nmslib` (deprecated) | Outside the k-NN query clause. Must be a leaf clause. +The `post_filter` parameter | After search (post-filtering) | Approximate | - `lucene`
- `faiss`
- `nmslib` (deprecated) | Outside the k-NN query clause. +Scoring script filter | Before search (pre-filtering) | Exact | N/A | Inside the script score query clause. + +## Filtered search optimization + +Depending on your dataset and use case, you might be more interested in maximizing recall or minimizing latency. The following table provides guidance on various k-NN search configurations and the filtering methods used to optimize for higher recall or lower latency. The first three columns of the table provide several example k-NN search configurations. A search configuration consists of: + +- The number of documents in an index, where one OpenSearch document corresponds to one k-NN vector. +- The percentage of documents left in the results after filtering. This value depends on the restrictiveness of the filter that you provide in the query. The most restrictive filter in the table returns 2.5% of documents in the index, while the least restrictive filter returns 80% of documents. +- The desired number of returned results (k). + +Once you've estimated the number of documents in your index, the restrictiveness of your filter, and the desired number of nearest neighbors, use the following table to choose a filtering method that optimizes for recall or latency. + +| Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for higher recall | Filtering method to use for lower latency | +| :-- | :-- | :-- | :-- | :-- | +| 10M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script | +| 10M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | +| 10M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | +| 1M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script | +| 1M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | +| 1M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering | diff --git a/_vector-search/filter-search-knn/post-filtering.md b/_vector-search/filter-search-knn/post-filtering.md new file mode 100644 index 0000000000..3525b0fc8c --- /dev/null +++ b/_vector-search/filter-search-knn/post-filtering.md @@ -0,0 +1,149 @@ +--- +layout: default +title: Post-filtering +parent: Filtering data +nav_order: 20 +--- + +## Post-filtering + +You can achieve post-filtering with a [Boolean filter](#boolean-filter-with-ann-search) or by providing [the `post_filter` parameter](#the-post_filter-parameter). + +### Boolean filter with ANN search + +A Boolean filter consists of a Boolean query that contains a k-NN query and a filter. For example, the following query searches for hotels that are closest to the specified `location` and then filters the results to return hotels with a rating between 8 and 10, inclusive, that provide parking: + +```json +POST /hotels-index/_search +{ + "size": 3, + "query": { + "bool": { + "filter": { + "bool": { + "must": [ + { + "range": { + "rating": { + "gte": 8, + "lte": 10 + } + } + }, + { + "term": { + "parking": "true" + } + } + ] + } + }, + "must": [ + { + "knn": { + "location": { + "vector": [ + 5, + 4 + ], + "k": 20 + } + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +The response includes documents containing the matching hotels: + +```json +{ + "took" : 95, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 5, + "relation" : "eq" + }, + "max_score" : 0.72992706, + "hits" : [ + { + "_index" : "hotels-index", + "_id" : "3", + "_score" : 0.72992706, + "_source" : { + "location" : [ + 4.9, + 3.4 + ], + "parking" : "true", + "rating" : 9 + } + }, + { + "_index" : "hotels-index", + "_id" : "6", + "_score" : 0.3012048, + "_source" : { + "location" : [ + 6.4, + 3.4 + ], + "parking" : "true", + "rating" : 9 + } + }, + { + "_index" : "hotels-index", + "_id" : "5", + "_score" : 0.24154587, + "_source" : { + "location" : [ + 3.3, + 4.5 + ], + "parking" : "true", + "rating" : 8 + } + } + ] + } +} +``` + +### The post_filter parameter + +If you use the `knn` query alongside filters or other clauses (for example, `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2 + } + } + }, + "post_filter": { + "range": { + "price": { + "gte": 5, + "lte": 10 + } + } + } +} +``` +{% include copy-curl.html %} \ No newline at end of file diff --git a/_vector-search/filter-search-knn/scoring-script-filter.md b/_vector-search/filter-search-knn/scoring-script-filter.md new file mode 100644 index 0000000000..aa928ec42e --- /dev/null +++ b/_vector-search/filter-search-knn/scoring-script-filter.md @@ -0,0 +1,57 @@ +--- +layout: default +title: Scoring script filter +parent: Filtering data +nav_order: 30 +--- + +# Scoring script filter + +A scoring script filter first filters the documents and then uses a brute-force exact k-NN search on the results. For example, the following query searches for hotels with a rating between 8 and 10, inclusive, that provide parking and then performs a k-NN search to return the 3 hotels that are closest to the specified `location`: + +```json +POST /hotels-index/_search +{ + "size": 3, + "query": { + "script_score": { + "query": { + "bool": { + "filter": { + "bool": { + "must": [ + { + "range": { + "rating": { + "gte": 8, + "lte": 10 + } + } + }, + { + "term": { + "parking": "true" + } + } + ] + } + } + } + }, + "script": { + "source": "knn_score", + "lang": "knn", + "params": { + "field": "location", + "query_value": [ + 5.0, + 4.0 + ], + "space_type": "l2" + } + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_vector-search/getting-started/auto-generated-embeddings.md b/_vector-search/getting-started/auto-generated-embeddings.md new file mode 100644 index 0000000000..84761f8992 --- /dev/null +++ b/_vector-search/getting-started/auto-generated-embeddings.md @@ -0,0 +1,323 @@ +--- +layout: default +title: Generating embeddings automatically +parent: Getting started +nav_order: 30 +--- + +# Generating embeddings automatically + +You can generate embeddings dynamically during ingestion within OpenSearch. This method provides a simplified workflow by converting data to vectors automatically. + +OpenSearch can automatically generate embeddings from your text data using two approaches: + +- [**Manual setup**](#manual-setup) (Recommended for custom configurations): Configure each component individually for full control over the implementation. +- [**Automated workflow**](#using-automated-workflows) (Recommended for quick setup): Use defaults and workflows for quick implementation with minimal configuration. + +## Prerequisites + +For this simple setup, you'll use an OpenSearch-provided machine learning (ML) model and a cluster with no dedicated ML nodes. To ensure that this basic local setup works, send the following request to update ML-related cluster settings: + +```json +PUT _cluster/settings +{ + "persistent": { + "plugins.ml_commons.only_run_on_ml_node": "false", + "plugins.ml_commons.model_access_control_enabled": "true", + "plugins.ml_commons.native_memory_threshold": "99" + } +} +``` +{% include copy-curl.html %} + +### Choose an ML model + +Generating embeddings automatically requires configuring a language model that will convert text to embeddings both at ingestion time and query time. + +When selecting a model, you have the following options: + +- Use a pretrained model provided by OpenSearch. For more information, see [OpenSearch-provided pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/). + +- Upload your own model to OpenSearch. For more information, see [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/). + +- Connect to a foundation model hosted on an external platform. For more information, see [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). + +In this example, you'll use the [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert) model from Hugging Face, which is one of the [pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) available in OpenSearch. For more information, see [Integrating ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/). + +Take note of the dimensionality of the model because you'll need it when you set up a vector index. +{: .important} + +## Manual setup + +For more control over the configuration, you can set up each component manually using the following steps. + +### Step 1: Register and deploy the model + +To register and deploy the model, send the following request: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Registering a model is an asynchronous task. OpenSearch returns a task ID for this task: + +```json +{ + "task_id": "aFeif4oB5Vm0Tdw8yoN7", + "status": "CREATED" +} +``` + +You can check the status of the task by using the Tasks API: + +```json +GET /_plugins/_ml/tasks/aFeif4oB5Vm0Tdw8yoN7 +``` +{% include copy-curl.html %} + +Once the task is complete, the task state will change to `COMPLETED` and the Tasks API response will contain a model ID for the registered model: + +```json +{ + "model_id": "aVeif4oB5Vm0Tdw8zYO2", + "task_type": "REGISTER_MODEL", + "function_name": "TEXT_EMBEDDING", + "state": "COMPLETED", + "worker_node": [ + "4p6FVOmJRtu3wehDD74hzQ" + ], + "create_time": 1694358489722, + "last_update_time": 1694358499139, + "is_async": true +} +``` + +You'll need the model ID in order to use this model for several of the following steps. + +### Step 2: Create an ingest pipeline + +First, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains one processor: a task that transforms document fields before documents are ingested into an index. You'll set up a `text_embedding` processor that creates vector embeddings from text. You'll need the `model_id` of the model you set up in the previous section and a `field_map`, which specifies the name of the field from which to take the text (`text`) and the name of the field in which to record embeddings (`passage_embedding`): + +```json +PUT /_ingest/pipeline/nlp-ingest-pipeline +{ + "description": "An NLP ingest pipeline", + "processors": [ + { + "text_embedding": { + "model_id": "aVeif4oB5Vm0Tdw8zYO2", + "field_map": { + "text": "passage_embedding" + } + } + } + ] +} +``` +{% include copy-curl.html %} + +### Step 3: Create a vector index + +Now you'll create a vector index by setting `index.knn` to `true`. In the index, the field named `text` contains an image description, and a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field named `passage_embedding` contains the vector embedding of the text. The vector field `dimension` must match the dimensionality of the model you configured in Step 2. Additionally, set the default ingest pipeline to the `nlp-ingest-pipeline` you created in the previous step: + + +```json +PUT /my-nlp-index +{ + "settings": { + "index.knn": true, + "default_pipeline": "nlp-ingest-pipeline" + }, + "mappings": { + "properties": { + "passage_embedding": { + "type": "knn_vector", + "dimension": 768, + "space_type": "l2" + }, + "text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +Setting up a vector index allows you to later perform a vector search on the `passage_embedding` field. + +### Step 4: Ingest documents into the index + +In this step, you'll ingest several sample documents into the index. The sample data is taken from the [Flickr image dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset). Each document contains a `text` field corresponding to the image description and an `id` field corresponding to the image ID: + +```json +PUT /my-nlp-index/_doc/1 +{ + "text": "A man who is riding a wild horse in the rodeo is very near to falling off ." +} +``` +{% include copy-curl.html %} + +```json +PUT /my-nlp-index/_doc/2 +{ + "text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ." +} +``` +{% include copy-curl.html %} + +```json +PUT /my-nlp-index/_doc/3 +{ + "text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ." +} +``` +{% include copy-curl.html %} + +### Step 5: Search the data + +Now you'll search the index using semantic search. To automatically generate vector embeddings from query text, use a `neural` query and provide the model ID of the model you set up earlier so that vector embeddings for the query text are generated with the model used at ingestion time: + +```json +GET /my-nlp-index/_search +{ + "_source": { + "excludes": [ + "passage_embedding" + ] + }, + "query": { + "neural": { + "passage_embedding": { + "query_text": "wild west", + "model_id": "aVeif4oB5Vm0Tdw8zYO2", + "k": 3 + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the matching documents: + +```json +{ + "took": 127, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 0.015851952, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "1", + "_score": 0.015851952, + "_source": { + "text": "A man who is riding a wild horse in the rodeo is very near to falling off ." + } + }, + { + "_index": "my-nlp-index", + "_id": "2", + "_score": 0.015177963, + "_source": { + "text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ." + } + }, + { + "_index": "my-nlp-index", + "_id": "3", + "_score": 0.011347729, + "_source": { + "text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ." + } + } + ] + } +} +``` + +## Using automated workflows + +You can quickly set up automatic embedding generation using [_automated workflows_]({{site.url}}{{site.baseurl}}/automating-configurations/). This approach automatically creates and provisions all necessary resources. For more information, see [Workflow templates]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-templates/). + +You can use automated workflows to create and deploy externally hosted models and create resources for various AI search types. In this example, you'll create the same search you've already created following manual steps. + +### Step 1: Register and deploy the model + +To register and deploy a model, select the built-in workflow template for the model provider. For more information, see [Supported workflow templates]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-templates/#supported-workflow-templates). Alternatively, to configure a custom model, use [Step 1 of the manual setup](#step-1-register-and-deploy-the-model). + +### Step 2: Configure a workflow + +Create and provision a semantic search workflow. You must provide the model ID for the configured model. Review your selected workflow template [defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/semantic-search-defaults.json) to determine whether you need to update any of the parameters. For example, if the model dimensionality is different from the default (`1024`), specify the dimensionality of your model in the `output_dimension` parameter. Change the workflow template default text field from `passage_text` to `text` in order to match the manual example: + +```json +POST /_plugins/_flow_framework/workflow?use_case=semantic_search&provision=true +{ + "create_ingest_pipeline.model_id" : "mBGzipQB2gmRjlv_dOoB", + "text_embedding.field_map.output.dimension": "768", + "text_embedding.field_map.input": "text" +} +``` +{% include copy-curl.html %} + +OpenSearch responds with a workflow ID for the created workflow: + +```json +{ + "workflow_id" : "U_nMXJUBq_4FYQzMOS4B" +} +``` + +To check the workflow status, send the following request: + +```json +GET /_plugins/_flow_framework/workflow/U_nMXJUBq_4FYQzMOS4B/_status +``` +{% include copy-curl.html %} + +Once the workflow completes, the `state` changes to `COMPLETED`. The workflow has created an ingest pipeline and an index called `my-nlp-index`: + +```json +{ + "workflow_id": "U_nMXJUBq_4FYQzMOS4B", + "state": "COMPLETED", + "resources_created": [ + { + "workflow_step_id": "create_ingest_pipeline", + "workflow_step_name": "create_ingest_pipeline", + "resource_id": "nlp-ingest-pipeline", + "resource_type": "pipeline_id" + }, + { + "workflow_step_name": "create_index", + "workflow_step_id": "create_index", + "resource_id": "my-nlp-index", + "resource_type": "index_name" + } + ] +} +``` + +You can now continue with [steps 4 and 5](#step-4-ingest-documents-into-the-index-ingest-documents-into-the-index) to ingest documents into the index and search the index. + +## Next steps + +- See [Getting started with semantic and hybrid search]({{site.url}}{{site.baseurl}}/vector-search/tutorials/neural-search-tutorial/) to learn about configuring semantic and hybrid search. +- See [AI search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/) to learn about the supported types of AI search. \ No newline at end of file diff --git a/_vector-search/getting-started/concepts.md b/_vector-search/getting-started/concepts.md new file mode 100644 index 0000000000..a19134a747 --- /dev/null +++ b/_vector-search/getting-started/concepts.md @@ -0,0 +1,75 @@ +--- +layout: default +title: Concepts +parent: Getting started +nav_order: 40 +--- + +# Concepts + +This page defines key terms and techniques related to vector search in OpenSearch. + +## Vector representations + +- [**_Vector embeddings_**]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-basics/#vector-embeddings) are numerical representations of data—such as text, images, or audio—that encode meaning or features into a high-dimensional space. These embeddings enable similarity-based comparisons for search and machine learning (ML) tasks. + +- **_Dense vectors_** are high-dimensional numerical representations where most elements have nonzero values. They are typically produced by deep learning models and are used in semantic search and ML applications. + +- **_Sparse vectors_** contain mostly zero values and are often used in techniques like neural sparse search to efficiently represent and retrieve information. + +## Vector search fundamentals + +- [**_Vector search_**]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-basics/), also known as _similarity search_ or _nearest neighbor search_, is a technique for finding items that are most similar to a given input vector. It is widely used in applications such as recommendation systems, image retrieval, and natural language processing. + +- A [**_space_**]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-basics/#calculating-similarity) defines how similarity or distance between two vectors is measured. Different spaces use different distance metrics, such as Euclidean distance or cosine similarity, to determine how closely vectors resemble each other. + +- A [**_method_**]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) refers to the algorithm used to organize vector data during indexing and retrieve relevant results during search in approximate k-NN search. Different methods balance trade-offs between accuracy, speed, and memory usage. + +- An [**_engine_**]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) is the underlying library that implements vector search methods. It determines how vectors are indexed, stored, and retrieved during similarity search operations. + +## k-NN search + +- **_k-nearest neighbors (k-NN) search_** finds the k most similar vectors to a given query vector in an index. The similarity is determined based on a specified distance metric. + +- [**_Exact k-NN search_**]({{site.url}}{{site.baseurl}}/vector-search/vector-search-techniques/knn-score-script/) performs a brute-force comparison between a query vector and all vectors in an index, computing the exact nearest neighbors. This approach provides high accuracy but can be computationally expensive for large datasets. + +- [**_Approximate k-NN search_**]({{site.url}}{{site.baseurl}}/vector-search/vector-search-techniques/approximate-knn/) reduces computational complexity by using indexing techniques that speed up search operations while maintaining high accuracy. These methods restructure the index or reduce the dimensionality of vectors to improve performance. + +## Query types + +- A [**_k-NN query_**]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) searches vector fields using a query vector. + +- A [**_neural query_**]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/) searches vector fields using text or image data. + +- A [**_neural sparse query_**]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/) searches vector fields using raw text or sparse vector tokens. + +## Search techniques + +- [**_Semantic search_**]({{site.url}}{{site.baseurl}}/vector-search/ai-search/semantic-search/) interprets the intent and contextual meaning of a query rather than relying solely on exact keyword matches. This approach improves the relevance of search results, especially for natural language queries. + +- [**_Hybrid search_**]({{site.url}}{{site.baseurl}}/vector-search/ai-search/hybrid-search/) combines lexical (keyword-based) search with semantic (vector-based) search to improve search relevance. This approach ensures that results include both exact keyword matches and conceptually similar content. + +- [**_Multimodal search_**]({{site.url}}{{site.baseurl}}/vector-search/ai-search/multimodal-search/) enables you to search across multiple types of data, such as text and images. It allows queries in one format (for example, text) to retrieve results in another (for example, images). + +- [**_Radial search_**]({{site.url}}{{site.baseurl}}/vector-search/specialized-operations/radial-search-knn/) retrieves all vectors within a specified distance or similarity threshold from a query vector. It is useful for tasks that require finding all relevant matches within a given range rather than retrieving a fixed number of nearest neighbors. + +- [**_Neural sparse search_**]({{site.url}}{{site.baseurl}}/vector-search/ai-search/neural-sparse-search/) uses an inverted index, similar to BM25, to efficiently retrieve relevant documents based on sparse vector representations. This approach maintains the efficiency of traditional lexical search while incorporating semantic understanding. + +- [**_Conversational search_**]({{site.url}}{{site.baseurl}}/vector-search/ai-search/conversational-search/) allows you to interact with a search system using natural language queries and refine results through follow-up questions. This approach enhances the user experience by making search more intuitive and interactive. + +- [**_Retrieval-augmented generation (RAG)_**]({{site.url}}{{site.baseurl}}/vector-search/ai-search/conversational-search/#rag) enhances large language models (LLMs) by retrieving relevant information from an index and incorporating it into the model's response. This approach improves the accuracy and relevance of generated text. + +## Indexing and storage techniques + +- [**_Text chunking_**]({{site.url}}{{site.baseurl}}/vector-search/ingesting-data/text-chunking/) involves splitting long documents or text passages into smaller segments to improve search retrieval and relevance. Chunking helps vector search models process large amounts of text more effectively. + +- [**_Vector quantization_**]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/) is a technique for reducing the storage size of vector embeddings by approximating them using a smaller set of representative vectors. This process enables efficient storage and retrieval in large-scale vector search applications. + +- **_Scalar quantization (SQ)_** reduces vector precision by mapping floating-point values to a limited set of discrete values, decreasing memory requirements while preserving search accuracy. + +- **_Product quantization (PQ)_** divides high-dimensional vectors into smaller subspaces and quantizes each subspace separately, enabling efficient approximate nearest neighbor search with reduced memory usage. + +- **_Binary quantization_** compresses vector representations by converting numerical values to binary formats. This technique reduces storage requirements and accelerates similarity computations. + +- [**_Disk-based vector search_**]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/) stores vector embeddings on disk rather than in memory, using binary quantization to reduce memory consumption while maintaining search efficiency. + diff --git a/_vector-search/getting-started/index.md b/_vector-search/getting-started/index.md new file mode 100644 index 0000000000..6d6b05051d --- /dev/null +++ b/_vector-search/getting-started/index.md @@ -0,0 +1,200 @@ +--- +layout: default +title: Getting started +nav_order: 10 +has_children: true +has_toc: false +redirect_from: + - /vector-search/getting-started/ +--- + +# Getting started with vector search + +This guide shows you how to use your own vectors in OpenSearch. You'll learn to create a vector index, add location data, and run a vector search to find the nearest hotels on a coordinate plane. While this example uses two-dimensional vectors for simplicity, the same approach applies to higher-dimensional vectors used in semantic search and recommendation systems. + + +## Prerequisite: Install OpenSearch + + +
+ + If you don't have OpenSearch installed, follow these steps to create a cluster. + + +Before you start, ensure that [Docker](https://docs.docker.com/get-docker/) is installed and running in your environment.
+This demo configuration is insecure and should not be used in production environments. +{: .note} + +Download and run OpenSearch: + +```bash +docker pull opensearchproject/opensearch:latest && docker run -it -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:latest +``` +{% include copy.html %} + +OpenSearch is now running on port 9200. To verify that OpenSearch is running, send the following request: + +```bash +curl https://localhost:9200 +``` +{% include copy.html %} + +You should get a response that looks like this: + +```json +{ + "name" : "a937e018cee5", + "cluster_name" : "docker-cluster", + "cluster_uuid" : "GLAjAG6bTeWErFUy_d-CLw", + "version" : { + "distribution" : "opensearch", + "number" : , + "build_type" : , + "build_hash" : , + "build_date" : , + "build_snapshot" : false, + "lucene_version" : , + "minimum_wire_compatibility_version" : "7.10.0", + "minimum_index_compatibility_version" : "7.0.0" + }, + "tagline" : "The OpenSearch Project: https://opensearch.org/" +} +``` + +For more information, see [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/) and [Install and upgrade OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/). + +
+ +## Step 1: Create a vector index + +First, create an index that will store sample hotel data. To signal to OpenSearch that this is a vector index, set `index.knn` to `true`. You'll store the vectors in a vector field named `location`. The vectors you'll ingest will be two-dimensional, and the distance between vectors will be calculated using the [Euclidean `l2` similarity metric]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-basics/#calculating-similarity): + +```json +PUT /hotels-index +{ + "settings": { + "index.knn": true + }, + "mappings": { + "properties": { + "location": { + "type": "knn_vector", + "dimension": 2, + "space_type": "l2" + } + } + } +} +``` +{% include copy-curl.html %} + +## Step 2: Add data to your index + +Next, add data to your index. Each document represents a hotel. The `location` field in each document contains a two-dimensional vector specifying the hotel's location: + +```json +POST /_bulk +{ "index": { "_index": "hotels-index", "_id": "1" } } +{ "location": [5.2, 4.4] } +{ "index": { "_index": "hotels-index", "_id": "2" } } +{ "location": [5.2, 3.9] } +{ "index": { "_index": "hotels-index", "_id": "3" } } +{ "location": [4.9, 3.4] } +{ "index": { "_index": "hotels-index", "_id": "4" } } +{ "location": [4.2, 4.6] } +{ "index": { "_index": "hotels-index", "_id": "5" } } +{ "location": [3.3, 4.5] } +``` +{% include copy-curl.html %} + +## Step 3: Search your data + +Now search for hotels closest to the pin location `[5, 4]`. To search for the top three closest hotels, set `k` to `3`: + +```json +POST /hotels-index/_search +{ + "size": 3, + "query": { + "knn": { + "location": { + "vector": [5, 4], + "k": 3 + } + } + } +} +``` +{% include copy-curl.html %} + +The following image shows the hotels on the coordinate plane. The query point is labeled `Pin`, and each hotel is labeled with its document number. + +![Hotels on a coordinate plane]({{site.url}}{{site.baseurl}}/images/k-nn-search-hotels.png){:style="width: 400px;" class="img-centered"} + +The response contains the hotels closest to the specified pin location: + +```json +{ + "took": 1093, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 0.952381, + "hits": [ + { + "_index": "hotels-index", + "_id": "2", + "_score": 0.952381, + "_source": { + "location": [ + 5.2, + 3.9 + ] + } + }, + { + "_index": "hotels-index", + "_id": "1", + "_score": 0.8333333, + "_source": { + "location": [ + 5.2, + 4.4 + ] + } + }, + { + "_index": "hotels-index", + "_id": "3", + "_score": 0.72992706, + "_source": { + "location": [ + 4.9, + 3.4 + ] + } + } + ] + } +} +``` + +## Generating vector embeddings automatically + +If your data isn't already in vector format, you can generate vector embeddings directly within OpenSearch. This allows you to transform text or images into their numerical representations for similarity search. For more information, see [Generating vector embeddings automatically]({{site.url}}{{site.baseurl}}/vector-search/getting-started/auto-generated-embeddings/). + +## Next steps + +- [Vector search basics]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-basics/) +- [Preparing vectors]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-options/) +- [Vector search with filters]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/) +- [Generating vector embeddings automatically]({{site.url}}{{site.baseurl}}/vector-search/getting-started/auto-generated-embeddings/) \ No newline at end of file diff --git a/_vector-search/getting-started/vector-search-basics.md b/_vector-search/getting-started/vector-search-basics.md new file mode 100644 index 0000000000..cf3b6f2c45 --- /dev/null +++ b/_vector-search/getting-started/vector-search-basics.md @@ -0,0 +1,44 @@ +--- +layout: default +title: Vector search basics +parent: Getting started +nav_order: 10 +--- + +# Vector search basics + +_Vector search_, also known as _similarity search_ or _nearest neighbor search_, is a powerful technique for finding items that are most similar to a given input. Use cases include semantic search to understand user intent, recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For more background information about vector search, see [Nearest neighbor search](https://en.wikipedia.org/wiki/Nearest_neighbor_search). + +## Vector embeddings + +Unlike traditional search methods that rely on exact keyword matches, vector search uses _vector embeddings_---numerical representations of data such as text, images, or audio. These embeddings are stored as multi-dimensional vectors, capturing deeper patterns and similarities in meaning, context, or structure. For example, a large language model (LLM) can create vector embeddings from input text, as shown in the following image. + +![Generating embeddings from text]({{site.url}}{{site.baseurl}}/images/vector-search/embeddings.png) + +## Similarity search + +A vector embedding is a vector in a high-dimensional space. Its position and orientation capture meaningful relationships between objects. Vector search finds the most similar results by comparing a query vector to stored vectors and returning the closest matches. OpenSearch uses the [k-nearest neighbors (k-NN) algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) to efficiently identify the most similar vectors. Unlike keyword search, which relies on exact word matches, vector search measures similarity based on distance in this high-dimensional space. + +In the following image, the vectors for `Wild West` and `Broncos` are closer to each other, while both are far from `Basketball`, reflecting their semantic differences. + +![Similarity search]({{site.url}}{{site.baseurl}}/images/vector-search/vector-similarity.jpg){: width="400px"} + +To learn more about the types of vector search that OpenSearch supports, see [Vector search techniques]({{site.url}}{{site.baseurl}}/vector-search/vector-search-techniques/). + +## Calculating similarity + +Vector similarity measures how close two vectors are in a multi-dimensional space, facilitating tasks like nearest neighbor search and ranking results by relevance. OpenSearch supports multiple distance metrics (_spaces_) for calculating vector similarity: + +- **L1 (Manhattan distance):** Sums the absolute differences between vector components. +- **L2 (Euclidean distance):** Calculates the square root of the sum of squared differences, making it sensitive to magnitude. +- **L∞ (Chebyshev distance):** Considers only the maximum absolute difference between corresponding vector elements. +- **Cosine similarity:** Measures the angle between vectors, focusing on direction rather than magnitude. +- **Inner product:** Determines similarity based on vector dot products, which can be useful for ranking. +- **Hamming distance:** Counts differing elements in binary vectors. +- **Hamming bit:** Applies the same principle as Hamming distance but is optimized for binary-encoded data. + +To learn more about the distance metrics, see [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/). + +## Next steps + +- [Preparing vectors]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-options/) \ No newline at end of file diff --git a/_vector-search/getting-started/vector-search-options.md b/_vector-search/getting-started/vector-search-options.md new file mode 100644 index 0000000000..f630ca96a7 --- /dev/null +++ b/_vector-search/getting-started/vector-search-options.md @@ -0,0 +1,94 @@ +--- +layout: default +title: Preparing vectors +parent: Getting started +nav_order: 20 +quickstart_cards: + - heading: "Getting started with vector search" + description: "Use raw vectors or embeddings generated outside of OpenSearch" + link: "/vector-search/getting-started/" +tutorial_cards: + - heading: "Generating embeddings automatically" + description: "Automatically convert data to embeddings within OpenSearch" + link: "/vector-search/getting-started/auto-generated-embeddings/" + - heading: "Getting started with semantic and hybrid search" + description: "Learn how to implement semantic and hybrid search" + link: "/vector-search/tutorials/neural-search-tutorial/" +pre_items: + - heading: "Generate embeddings" + description: "Generate embeddings outside of OpenSearch using your favorite embedding utility." + - heading: "Create an OpenSearch index" + description: "Create an OpenSearch index to store your embeddings." + link: "/vector-search/creating-vector-index/#storing-raw-vectors-or-embeddings-generated-outside-of-opensearch" + - heading: "Ingest embeddings" + description: "Ingest your embeddings into the index." + link: "/vector-search/ingesting-data/#raw-vector-ingestion" + - heading: "Search embeddings" + description: "Search your embeddings using vector search." + link: "/vector-search/searching-data/#searching-raw-vectors" +auto_items: + - heading: "Configure an embedding model" + description: "Configure a machine learning model that will automatically generate embeddings from your text at ingestion time and query time." + link: "/ml-commons-plugin/integrating-ml-models/" + - heading: "Create an OpenSearch index" + description: "Create an OpenSearch index to store your text." + link: "/vector-search/creating-vector-index/#converting-data-to-embeddings-during-ingestion" + - heading: "Ingest text" + description: "Ingest your text into the index." + link: "/vector-search/ingesting-data/#converting-data-to-embeddings-during-ingestion" + - heading: "Search text" + description: "Search your text using vector search. Query text is automatically converted to vector embeddings and compared to document embeddings." + link: "/vector-search/searching-data/#searching-auto-generated-embeddings" +--- + +# Preparing vectors + +In OpenSearch, you can either bring your own vectors or let OpenSearch generate them automatically from your data. Letting OpenSearch automatically generate your embeddings reduces data preprocessing effort at ingestion and search time. + +### Option 1: Bring your own raw vectors or generated embeddings + +You already have pre-computed embeddings or raw vectors from external tools or services. + - **Ingestion**: Ingest pregenerated embeddings directly into OpenSearch. + + ![Pre-generated embeddings ingestion]({{site.url}}{{site.baseurl}}/images/vector-search/raw-vector-ingest.png) + - **Search**: Perform vector search to find the vectors that are closest to a query vector. + + ![Pre-generated embeddings search]({{site.url}}{{site.baseurl}}/images/vector-search/raw-vector-search.png) + +
+ + Steps + + {: .fs-5 .fw-700} + +Working with embeddings generated outside of OpenSearch involves the following steps: + +{% include list.html list_items=page.pre_items%} + +
+ +{% include cards.html cards=page.quickstart_cards %} + +### Option 2: Generate embeddings within OpenSearch + +Use this option to let OpenSearch automatically generate vector embeddings from your data using a machine learning (ML) model. + - **Ingestion**: You ingest plain data, and OpenSearch uses an ML model to generate embeddings dynamically. + + ![Auto-generated embeddings ingestion]({{site.url}}{{site.baseurl}}/images/vector-search/auto-vector-ingest.png) + - **Search**: At query time, OpenSearch uses the same ML model to convert your input data to embeddings, and these embeddings are used for vector search. + + ![Auto-generated embeddings search]({{site.url}}{{site.baseurl}}/images/vector-search/auto-vector-search.png) + +
+ + Steps + + {: .fs-5 .fw-700} + +Working with text that is automatically converted to embeddings within OpenSearch involves the following steps: + +{% include list.html list_items=page.auto_items%} + +
+ +{% include cards.html cards=page.tutorial_cards %} \ No newline at end of file diff --git a/_vector-search/index.md b/_vector-search/index.md new file mode 100644 index 0000000000..49ab572177 --- /dev/null +++ b/_vector-search/index.md @@ -0,0 +1,69 @@ +--- +layout: default +title: Vector search +nav_order: 1 +has_children: false +has_toc: false +nav_exclude: true +permalink: /vector-search/ +redirect_from: + - /vector-search/index/ + - /search-plugins/vector-search/ +tutorial_cards: + - heading: "Get started with vector search" + description: "Build powerful similarity search applications using your existing vectors or embeddings" + link: "/vector-search/getting-started/" + - heading: "Generate embeddings automatically" + description: "Streamline your vector search using OpenSearch's built-in embedding generation" + link: "/vector-search/getting-started/auto-generated-embeddings/" +more_cards: + - heading: "AI search" + description: "Discover AI search, from semantic, hybrid, and multimodal search to RAG" + link: "/vector-search/ai-search/" + - heading: "Tutorials" + description: "Follow step-by-step tutorials to build AI-powered search for your applications" + link: "/vector-search/tutorials/" + - heading: "Advanced filtering" + description: "Refine search results while maintaining semantic relevance" + link: "/vector-search/filter-search-knn/" + - heading: "Memory-efficient search" + description: "Reduce memory footprint using vector compression methods" + link: "/vector-search/optimizing-storage/" + - heading: "Sparse vector support" + description: "Combine semantic understanding with traditional search efficiency using neural sparse search" + link: "/vector-search/filter-search-knn/" + - heading: "Multi-vector support" + description: "Store and search multiple vectors per document using nested fields" + link: "/vector-search/specialized-operations/nested-search-knn/" +items: + - heading: "Create an index" + description: "Create a vector index for storing your embeddings." + link: "/vector-search/creating-vector-index/" + - heading: "Ingest data" + description: "Ingest your data into the index." + link: "/vector-search/ingesting-data/" + - heading: "Search data" + description: "Use raw vector search or AI-powered methods like semantic, hybrid, multimodal, or neural sparse search. Add RAG to build conversational search." + link: "/vector-search/searching-data/" +--- + +# Vector search + +OpenSearch [vector search]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-basics/) provides a complete vector database solution for building efficient AI applications. Store and search vector embeddings alongside your existing data, making it easy to implement semantic search, retrieval-augmented generation (RAG), recommendation systems, and other AI-powered applications. + +{% include cards.html cards=page.tutorial_cards %} + +## Overview + +You can bring your own vectors or let OpenSearch generate embeddings automatically from your data. See [Preparing vectors]({{site.url}}{{site.baseurl}}/vector-search/getting-started/vector-search-options/). +{: .info } + +{% include list.html list_items=page.items%} + + +[Get started]({{site.url}}{{site.baseurl}}/vector-search/getting-started/){: .btn-dark-blue} + + +## Build your solution + +{% include cards.html cards=page.more_cards %} \ No newline at end of file diff --git a/_vector-search/ingesting-data/index.md b/_vector-search/ingesting-data/index.md new file mode 100644 index 0000000000..2a978de626 --- /dev/null +++ b/_vector-search/ingesting-data/index.md @@ -0,0 +1,86 @@ +--- +layout: default +title: Ingesting data +nav_order: 30 +has_children: true +has_toc: false +redirect_from: + - /vector-search/ingesting-data/ +--- + +# Ingesting data into a vector index + +After creating a vector index, you need to either ingest raw vector data or convert data to embeddings while ingesting it. + +## Comparison of ingestion methods + +The following table compares the two ingestion methods. + +| Feature | Data format | Ingest pipeline | Vector generation | Additional fields | +|-------------------------------|----------------------------|---------------------|---------------------------------|-----------------------------------| +| **Raw vector ingestion** | Pre-generated vectors | Not required | External | Optional metadata | +| **Converting data to embeddings during ingestion** | Text or image data | Required | Internal (during ingestion) | Original data + embeddings | + +## Raw vector ingestion + +When working with raw vectors or embeddings generated outside of OpenSearch, you directly ingest vector data into the `knn_vector` field. No pipeline is required because the vectors are already generated: + +```json +PUT /my-raw-vector-index/_doc/1 +{ + "my_vector": [0.1, 0.2, 0.3], + "metadata": "Optional additional information" +} +``` +{% include copy-curl.html %} + +You can also use the [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) to ingest multiple vectors efficiently: + +```json +PUT /_bulk +{"index": {"_index": "my-raw-vector-index", "_id": 1}} +{"my_vector": [0.1, 0.2, 0.3], "metadata": "First item"} +{"index": {"_index": "my-raw-vector-index", "_id": 2}} +{"my_vector": [0.2, 0.3, 0.4], "metadata": "Second item"} +``` +{% include copy-curl.html %} + +## Converting data to embeddings during ingestion + +After you have [configured an ingest pipeline]({{site.url}}{{site.baseurl}}/vector-search/creating-vector-index/#converting-data-to-embeddings-during-ingestion) that automatically generates embeddings, you can ingest text data directly into your index: + +```json +PUT /my-ai-search-index/_doc/1 +{ + "input_text": "Example: AI search description" +} +``` +{% include copy-curl.html %} + +The pipeline automatically generates and stores the embeddings in the `output_embedding` field. + +You can also use the [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) to ingest multiple documents efficiently: + +```json +PUT /_bulk +{"index": {"_index": "my-ai-search-index", "_id": 1}} +{"input_text": "Example AI search description"} +{"index": {"_index": "my-ai-search-index", "_id": 2}} +{"input_text": "Bulk API operation description"} +``` +{% include copy-curl.html %} + +## Working with sparse vectors + +OpenSearch also supports sparse vectors. For more information, see [Neural sparse search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/neural-sparse-search/). + +## Text chunking + +For information about splitting large documents into smaller passages before generating embeddings during dense or sparse AI search, see [Text chunking]({{site.url}}{{site.baseurl}}/vector-search/ingesting-data/text-chunking/). + +## Next steps + +- [Searching vector data]({{site.url}}{{site.baseurl}}/vector-search/searching-data/) +- [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) +- [Ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) +- [Text embedding processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/text-embedding/) \ No newline at end of file diff --git a/_search-plugins/text-chunking.md b/_vector-search/ingesting-data/text-chunking.md similarity index 78% rename from _search-plugins/text-chunking.md rename to _vector-search/ingesting-data/text-chunking.md index b66cfeda61..c011ee26f2 100644 --- a/_search-plugins/text-chunking.md +++ b/_vector-search/ingesting-data/text-chunking.md @@ -1,13 +1,18 @@ --- layout: default title: Text chunking -nav_order: 65 +parent: Ingesting data +nav_order: 80 +redirect_from: + - /search-plugins/text-chunking/ --- # Text chunking Introduced 2.13 {: .label .label-purple } +When working with large text documents in [AI search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/), it's often necessary to split them into smaller passages because most embedding models have token length limitations. This process, called _text chunking_, helps maintain the quality and relevance of vector search results by ensuring that each embedding represents a focused piece of content that fits within model constraints. + To split long text into passages, you can use a `text_chunking` processor as a preprocessing step for a `text_embedding` or `sparse_encoding` processor in order to obtain embeddings for each chunked passage. For more information about the processor parameters, see [Text chunking processor]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/text-chunking/). Before you start, follow the steps outlined in the [pretrained model documentation]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) to register an embedding model. The following example preprocesses text by splitting it into passages and then produces embeddings using the `text_embedding` processor. ## Step 1: Create a pipeline @@ -48,7 +53,7 @@ PUT _ingest/pipeline/text-chunking-embedding-ingest-pipeline ## Step 2: Create an index for ingestion -In order to use the ingest pipeline, you need to create a k-NN index. The `passage_chunk_embedding` field must be of the `nested` type. The `knn.dimension` field must contain the number of dimensions for your model: +In order to use the ingest pipeline, you need to create a vector index. The `passage_chunk_embedding` field must be of the `nested` type. The `knn.dimension` field must contain the number of dimensions for your model: ```json PUT testindex @@ -90,7 +95,7 @@ POST testindex/_doc?pipeline=text-chunking-embedding-ingest-pipeline ``` {% include copy-curl.html %} -## Step 4: Search the index using neural search +## Step 4: Search the index You can use a `nested` query to perform vector search on your index. We recommend setting `score_mode` to `max`, where the document score is set to the highest score out of all passage embeddings: @@ -114,3 +119,7 @@ GET testindex/_search } ``` {% include copy-curl.html %} + +## Next steps + +- Explore our [tutorials]({{site.url}}{{site.baseurl}}/vector-search/tutorials/) to learn how to build AI search applications. diff --git a/_vector-search/optimizing-storage/binary-quantization.md b/_vector-search/optimizing-storage/binary-quantization.md new file mode 100644 index 0000000000..514003cd01 --- /dev/null +++ b/_vector-search/optimizing-storage/binary-quantization.md @@ -0,0 +1,204 @@ +--- +layout: default +title: Binary quantization +parent: Vector quantization +grand_parent: Optimizing vector storage +nav_order: 40 +has_children: false +has_math: true +--- + +# Binary quantization + +Starting with version 2.17, OpenSearch supports binary quantization (BQ) with binary vector support for the Faiss engine. BQ compresses vectors into a binary format (0s and 1s), making it highly efficient in terms of memory usage. You can choose to represent each vector dimension using 1, 2, or 4 bits, depending on the desired precision. One of the advantages of using BQ is that the training process is handled automatically during indexing. This means that no separate training step is required, unlike other quantization techniques such as PQ. + +## Using BQ + +To configure BQ for the Faiss engine, define a `knn_vector` field and specify the `mode` as `on_disk`. This configuration defaults to 1-bit BQ and both `ef_search` and `ef_construction` set to `100`: + +```json +PUT my-vector-index +{ + "settings" : { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "space_type": "l2", + "data_type": "float", + "mode": "on_disk" + } + } + } +} +``` +{% include copy-curl.html %} + +To further optimize the configuration, you can specify additional parameters, such as the compression level, and fine-tune the search parameters. For example, you can override the `ef_construction` value or define the compression level, which corresponds to the number of bits used for quantization: + +- **32x compression** for 1-bit quantization +- **16x compression** for 2-bit quantization +- **8x compression** for 4-bit quantization + +This allows for greater control over memory usage and recall performance, providing flexibility to balance between precision and storage efficiency. + +To specify the compression level, set the `compression_level` parameter: + +```json +PUT my-vector-index +{ + "settings" : { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "space_type": "l2", + "data_type": "float", + "mode": "on_disk", + "compression_level": "16x", + "method": { + "name": "hnsw", + "engine": "faiss", + "parameters": { + "ef_construction": 16 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +The following example further fine-tunes the configuration by defining `ef_construction`, `encoder`, and the number of `bits` (which can be `1`, `2`, or `4`): + +```json +PUT my-vector-index +{ + "settings" : { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "method": { + "name": "hnsw", + "engine": "faiss", + "space_type": "l2", + "parameters": { + "m": 16, + "ef_construction": 512, + "encoder": { + "name": "binary", + "parameters": { + "bits": 1 + } + } + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Search using binary quantized vectors + +You can perform a vector search on your index by providing a vector and specifying the number of nearest neighbors (k) to return: + +```json +GET my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector_field": { + "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], + "k": 10 + } + } + } +} +``` +{% include copy-curl.html %} + +You can also fine-tune search by providing the `ef_search` and `oversample_factor` parameters. +The `oversample_factor` parameter controls the factor by which the search oversamples the candidate vectors before ranking them. Using a higher oversample factor means that more candidates will be considered before ranking, improving accuracy but also increasing search time. When selecting the `oversample_factor` value, consider the trade-off between accuracy and efficiency. For example, setting the `oversample_factor` to `2.0` will double the number of candidates considered during the ranking phase, which may help achieve better results. + +The following request specifies the `ef_search` and `oversample_factor` parameters: + +```json +GET my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector_field": { + "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], + "k": 10, + "method_parameters": { + "ef_search": 10 + }, + "rescore": { + "oversample_factor": 10.0 + } + } + } + } +} +``` +{% include copy-curl.html %} + + +## HNSW memory estimation + +The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. + +As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The following sections provide memory requirement estimations for various compression values. + +### 1-bit quantization (32x compression) + +In 1-bit quantization, each dimension is represented using 1 bit, equivalent to a 32x compression factor. The memory requirement can be estimated as follows: + +```r +Memory = 1.1 * ((256 * 1 / 8) + 8 * 16) * 1,000,000 + ~= 0.176 GB +``` + +### 2-bit quantization (16x compression) + +In 2-bit quantization, each dimension is represented using 2 bits, equivalent to a 16x compression factor. The memory requirement can be estimated as follows: + +```r +Memory = 1.1 * ((256 * 2 / 8) + 8 * 16) * 1,000,000 + ~= 0.211 GB +``` + +### 4-bit quantization (8x compression) + +In 4-bit quantization, each dimension is represented using 4 bits, equivalent to an 8x compression factor. The memory requirement can be estimated as follows: + +```r +Memory = 1.1 * ((256 * 4 / 8) + 8 * 16) * 1,000,000 + ~= 0.282 GB +``` + +## Next steps + +- [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/) +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) \ No newline at end of file diff --git a/_search-plugins/knn/disk-based-vector-search.md b/_vector-search/optimizing-storage/disk-based-vector-search.md similarity index 69% rename from _search-plugins/knn/disk-based-vector-search.md rename to _vector-search/optimizing-storage/disk-based-vector-search.md index 8fe794f44c..d97de64e8a 100644 --- a/_search-plugins/knn/disk-based-vector-search.md +++ b/_vector-search/optimizing-storage/disk-based-vector-search.md @@ -1,18 +1,20 @@ --- layout: default title: Disk-based vector search -nav_order: 16 -parent: k-NN search +nav_order: 20 +parent: Optimizing vector storage has_children: false +redirect_from: + - /search-plugins/knn/disk-based-vector-search/ --- # Disk-based vector search **Introduced 2.17** {: .label .label-purple} -For low-memory environments, OpenSearch provides _disk-based vector search_, which significantly reduces the operational costs for vector workloads. Disk-based vector search uses [binary quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#binary-quantization), compressing vectors and thereby reducing the memory requirements. This memory optimization provides large memory savings at the cost of slightly increased search latency while still maintaining strong recall. +For low-memory environments, OpenSearch provides _disk-based vector search_, which significantly reduces the operational costs for vector workloads. Disk-based vector search uses [binary quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/binary-quantization/), compressing vectors and thereby reducing the memory requirements. This memory optimization provides large memory savings at the cost of slightly increased search latency while still maintaining strong recall. -To use disk-based vector search, set the [`mode`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#vector-workload-modes) parameter to `on_disk` for your vector field type. This parameter will configure your index to use secondary storage. +To use disk-based vector search, set the [`mode`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#vector-workload-modes) parameter to `on_disk` for your vector field type. This parameter will configure your index to use secondary storage. For more information about disk-based search parameters, see [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/). ## Creating an index for disk-based vector search @@ -41,7 +43,7 @@ PUT my-vector-index ``` {% include copy-curl.html %} -By default, the `on_disk` mode configures the index to use the `faiss` engine and `hnsw` method. The default [`compression_level`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#compression-levels) of `32x` reduces the amount of memory the vectors require by a factor of 32. To preserve the search recall, rescoring is enabled by default. A search on a disk-optimized index runs in two phases: The compressed index is searched first, and then the results are rescored using full-precision vectors loaded from disk. +By default, the `on_disk` mode configures the index to use the `faiss` engine and `hnsw` method. The default [`compression_level`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#compression-levels) of `32x` reduces the amount of memory the vectors require by a factor of 32. To preserve the search recall, rescoring is enabled by default. A search on a disk-optimized index runs in two phases: The compressed index is searched first, and then the results are rescored using full-precision vectors loaded from disk. To reduce the compression level, provide the `compression_level` parameter when creating the index mapping: @@ -69,7 +71,7 @@ PUT my-vector-index ``` {% include copy-curl.html %} -For more information about the `compression_level` parameter, see [Compression levels]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#compression-levels). Note that for `4x` compression, the `lucene` engine will be used. +For more information about the `compression_level` parameter, see [Compression levels]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#compression-levels). Note that for `4x` compression, the `lucene` engine will be used. {: .note} If you need more granular fine-tuning, you can override additional k-NN parameters in the method definition. For example, to improve recall, increase the `ef_construction` parameter value: @@ -134,7 +136,7 @@ POST _bulk ## Search -Search is also performed in the same way as in other index configurations. The key difference is that, by default, the `oversample_factor` of the rescore parameter is set to `3.0` (unless you override the `compression_level`). For more information, see [Rescoring quantized results using full precision]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#rescoring-quantized-results-using-full-precision). To perform vector search on a disk-optimized index, provide the search vector: +Search is also performed in the same way as in other index configurations. The key difference is that, by default, the `oversample_factor` of the rescore parameter is set to `3.0` (unless you override the `compression_level`). For more information, see [Rescoring quantized results to full precision]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#rescoring-quantized-results-to-full-precision). To perform vector search on a disk-optimized index, provide the search vector: ```json GET my-vector-index/_search @@ -179,7 +181,7 @@ GET my-vector-index/_search ## Model-based indexes -For [model-based indexes]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model), you can specify the `on_disk` parameter in the training request in the same way that you would specify it during index creation. By default, `on_disk` mode will use the [Faiss IVF method]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#supported-faiss-methods) and a compression level of `32x`. To run the training API, send the following request: +For [model-based indexes]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-vector-index-from-a-model), you can specify the `on_disk` parameter in the training request in the same way that you would specify it during index creation. By default, `on_disk` mode will use the [Faiss IVF method]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#ivf-parameters) and a compression level of `32x`. To run the training API, send the following request: ```json POST /_plugins/_knn/models/test-model/_train @@ -196,13 +198,14 @@ POST /_plugins/_knn/models/test-model/_train ``` {% include copy-curl.html %} -This command assumes that training data has been ingested into the `train-index-name` index. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). +This command assumes that training data has been ingested into the `train-index-name` index. For more information, see [Building a vector index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-vector-index-from-a-model). {: .note} -You can override the `compression_level` for disk-optimized indexes in the same way as for regular k-NN indexes. +You can override the `compression_level` for disk-optimized indexes in the same way as for regular vector indexes. ## Next steps -- For more information about binary quantization, see [Binary quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#binary-quantization). -- For more information about k-NN vector workload modes, see [Vector workload modes]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#vector-workload-modes). \ No newline at end of file +- [Binary quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/binary-quantization/). +- [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/) +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) \ No newline at end of file diff --git a/_vector-search/optimizing-storage/faiss-16-bit-quantization.md b/_vector-search/optimizing-storage/faiss-16-bit-quantization.md new file mode 100644 index 0000000000..de386192d5 --- /dev/null +++ b/_vector-search/optimizing-storage/faiss-16-bit-quantization.md @@ -0,0 +1,159 @@ +--- +layout: default +title: Faiss 16-bit scalar quantization +parent: Vector quantization +grand_parent: Optimizing vector storage +nav_order: 20 +has_children: false +has_math: true +--- + +# Faiss 16-bit scalar quantization + +Starting with version 2.13, OpenSearch supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a vector index. + +At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#simd-optimization), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. + +SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies. +{: .warning} + +## Using Faiss scalar quantization + +To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a vector index: + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2", + "method": { + "name": "hnsw", + "engine": "faiss", + "parameters": { + "encoder": { + "name": "sq" + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#sq-parameters). + +The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. + +When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`. + +We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall. +{: .note} + +The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default): + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 3, + "space_type": "l2", + "method": { + "name": "hnsw", + "engine": "faiss", + "parameters": { + "encoder": { + "name": "sq", + "parameters": { + "type": "fp16" + } + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +During ingestion, make sure each vector dimension is in the supported range ([-65504.0, 65504.0]). + +```json +PUT test-index/_doc/1 +{ + "my_vector1": [-65504.0, 65503.845, 55.82] +} +``` +{% include copy-curl.html %} + +During querying, the query vector has no range limitation: + +```json +GET test-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector1": { + "vector": [265436.876, -120906.256, 99.84], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +## Memory estimation + +In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. + +### HNSW memory estimation + +The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (2 * dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. + +As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The memory requirement can be estimated as follows: + +```r +1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB +``` + +### IVF memory estimation + +The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * dimension))` bytes/vector, where `nlist` is the number of buckets to partition vectors into. + +As an example, assume that you have 1 million vectors with a dimension of 256 and an `nlist` of 128. The memory requirement can be estimated as follows: + +```r +1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB +``` + +## Next steps + +- [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/) +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) \ No newline at end of file diff --git a/_vector-search/optimizing-storage/faiss-product-quantization.md b/_vector-search/optimizing-storage/faiss-product-quantization.md new file mode 100644 index 0000000000..7c27a1bad4 --- /dev/null +++ b/_vector-search/optimizing-storage/faiss-product-quantization.md @@ -0,0 +1,57 @@ +--- +layout: default +title: Faiss product quantization +parent: Vector quantization +grand_parent: Optimizing vector storage +nav_order: 30 +has_children: false +has_math: true +--- + +# Faiss product quantization + +Product quantization (PQ) is a technique used to represent a vector using a configurable number of bits. In general, it can be used to achieve a higher level of compression as compared to byte or scalar quantization. PQ works by separating vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector is `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or _IVF_ approximate nearest neighbor (ANN) algorithms. + +## Using Faiss product quantization + +To minimize loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched. + +The product quantizer is trained by running k-means clustering on a set of training vectors for each subvector space and extracts the centroids to be used for encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. + +In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm is used and how much data will be stored in the index. For IVF-based indexes, a recommended number of training vectors is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a recommended number is `2^code_size*1000`. See the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more information about the methodology used to calculate these figures. + +For PQ, both _m_ and _code_size_ need to be selected. _m_ determines the number of subvectors into which vectors should be split for separate encoding. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines the number of bits used to encode each subvector. In general, we recommend a setting of `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. + +For an example of setting up an index with PQ, see the [Building a vector index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-vector-index-from-a-model) tutorial. + +## Memory estimation + +While PQ is meant to represent individual vectors with `m*code_size` bits, in reality, the indexes consume more space. This is mainly because of the overhead of storing certain code tables and auxiliary data structures. + +Some of the memory formulas depend on the number of segments present. This is not typically known beforehand, but a recommended default value is 300. +{: .note} + +### HNSW memory estimation + +The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24 + 8 * hnsw_m) * num_vectors + num_segments * (2^pq_code_size * 4 * d))` bytes. + +As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: + +```r +1.1 * ((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB +``` + +### IVF memory estimation + +The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes. + +For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: + +```r +1.1 * ((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB +``` + +## Next steps + +- [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/) +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) \ No newline at end of file diff --git a/_vector-search/optimizing-storage/index.md b/_vector-search/optimizing-storage/index.md new file mode 100644 index 0000000000..4b04024e71 --- /dev/null +++ b/_vector-search/optimizing-storage/index.md @@ -0,0 +1,22 @@ +--- +layout: default +title: Optimizing vector storage +nav_order: 60 +has_children: true +has_toc: false +redirect_from: + - /vector-search/optimizing-storage/ +storage_cards: + - heading: "Vector quantization" + description: "Reduce vector storage space by quantizing vectors" + link: "/vector-search/optimizing-storage/knn-vector-quantization/" + - heading: "Disk-based vector search" + description: "Uses binary quantization to reduce the operational costs of vector workloads" + link: "/vector-search/optimizing-storage/disk-based-vector-search/" +--- + +# Optimizing vector storage + +Vector search operations can be resource intensive, especially when dealing with large-scale vector datasets. OpenSearch provides several optimization techniques for reducing memory usage. + +{% include cards.html cards=page.storage_cards %} \ No newline at end of file diff --git a/_vector-search/optimizing-storage/knn-vector-quantization.md b/_vector-search/optimizing-storage/knn-vector-quantization.md new file mode 100644 index 0000000000..598d9d7eed --- /dev/null +++ b/_vector-search/optimizing-storage/knn-vector-quantization.md @@ -0,0 +1,48 @@ +--- +layout: default +title: Vector quantization +parent: Optimizing vector storage +nav_order: 10 +has_children: true +has_toc: false +redirect_from: + - /search-plugins/knn/knn-vector-quantization/ +outside_cards: + - heading: "Byte vectors" + description: "Quantize vectors into byte vectors" + link: "/field-types/supported-field-types/knn-memory-optimized/#byte-vectors" + - heading: "Binary vectors" + description: "Quantize vectors into binary vector" + link: "/field-types/supported-field-types/knn-memory-optimized/#binary-vectors" +inside_cards: + - heading: "Lucene scalar quantization" + description: "Use built-in scalar quantization for the Lucene engine" + link: "/vector-search/optimizing-storage/lucene-scalar-quantization/" + - heading: "Faiss 16-bit scalar quantization" + description: "Use built-in scalar quantization for the Faiss engine" + link: "/vector-search/optimizing-storage/faiss-16-bit-quantization/" + - heading: "Faiss product quantization" + description: "Use built-in product quantization for the Faiss engine" + link: "/vector-search/optimizing-storage/faiss-product-quantization/" + - heading: "Binary quantization" + description: "Use built-in binary quantization for the Faiss engine" + link: "/vector-search/optimizing-storage/binary-quantization/" +--- + +# Vector quantization + +By default, OpenSearch supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for the native `faiss` and `nmslib` [deprecated] engines). To reduce the memory footprint, you can use vector quantization. + +OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. + +## Quantize vectors outside of OpenSearch + +Quantize vectors outside of OpenSearch before ingesting them into an OpenSearch index. + +{% include cards.html cards=page.outside_cards %} + +## Quantize vectors within OpenSearch + +Use OpenSearch built-in quantization to quantize vectors. + +{% include cards.html cards=page.inside_cards %} \ No newline at end of file diff --git a/_vector-search/optimizing-storage/lucene-scalar-quantization.md b/_vector-search/optimizing-storage/lucene-scalar-quantization.md new file mode 100644 index 0000000000..021f1a8537 --- /dev/null +++ b/_vector-search/optimizing-storage/lucene-scalar-quantization.md @@ -0,0 +1,120 @@ +--- +layout: default +title: Lucene scalar quantization +parent: Vector quantization +grand_parent: Optimizing vector storage +nav_order: 10 +has_children: false +has_math: true +--- + +# Lucene scalar quantization + +Starting with version 2.16, OpenSearch supports built-in scalar quantization for the Lucene engine. Unlike [byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#byte-vectors), which require you to quantize vectors before ingesting documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors. + +Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. + +## Using Lucene scalar quantization + +To use the Lucene scalar quantizer, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a vector index: + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2, + "space_type": "l2", + "method": { + "name": "hnsw", + "engine": "lucene", + "parameters": { + "encoder": { + "name": "sq" + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Confidence interval + +Optionally, you can specify the `confidence_interval` parameter in the `method.parameters.encoder` object. +The `confidence_interval` is used to compute the minimum and maximum quantiles in order to quantize the vectors: +- If you set the `confidence_interval` to a value in the `0.9` to `1.0` range, inclusive, then the quantiles are calculated statically. For example, setting the `confidence_interval` to `0.9` specifies to compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values. +- Setting `confidence_interval` to `0` specifies to compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data. +- When `confidence_interval` is not set, it is computed based on the vector dimension $$d$$ using the formula $$max(0.9, 1 - \frac{1}{1 + d})$$. + +Lucene scalar quantization is applied only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), then the request is rejected. +{: .warning} + +The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors when computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default: + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2, + "space_type": "l2", + "method": { + "name": "hnsw", + "engine": "lucene", + "parameters": { + "encoder": { + "name": "sq", + "parameters": { + "confidence_interval": 1.0 + } + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +There are no changes to ingestion or query mapping and no range limitations for the input vectors. + +# Memory estimation + +In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use only 25% of the memory required by 32-bit vectors. + +### HNSW memory estimation + +The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. + +As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: + +```r +1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.4 GB +``` + +## Next steps + +- [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/) +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) \ No newline at end of file diff --git a/_vector-search/performance-tuning-indexing.md b/_vector-search/performance-tuning-indexing.md new file mode 100644 index 0000000000..d518fa2536 --- /dev/null +++ b/_vector-search/performance-tuning-indexing.md @@ -0,0 +1,148 @@ +--- +layout: default +title: Indexing performance tuning +nav_order: 10 +parent: Performance tuning +--- + +# Indexing performance tuning + +Take any of the following steps to improve indexing performance, especially when you plan to index a large number of vectors at once. + +## Disable the refresh interval + +Either disable the refresh interval (default = 1 sec) or set a long duration for the refresh interval to avoid creating multiple small segments: + +```json +PUT //_settings +{ + "index" : { + "refresh_interval" : "-1" + } +} +``` +{% include copy-curl.html %} + +Make sure to reenable `refresh_interval` after indexing is complete. + +## Disable replicas (no OpenSearch replica shard) + + Set replicas to `0` to prevent duplicate construction of native library indexes in both primary and replica shards. When you enable replicas after indexing completes, the serialized native library indexes are copied directly. If you have no replicas, losing nodes might cause data loss, so it's important that the data be stored elsewhere so that this initial load can be retried in the event of an issue. + +## Increase the number of indexing threads + +If your hardware has multiple cores, you can allow multiple threads in native library index construction by speeding up the indexing process. Determine the number of threads to allot with the [knn.algo_param.index_thread_qty]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings#cluster-settings) setting. + +Monitor CPU utilization and choose the correct number of threads. Because native library index construction is costly, choosing more threads than you need can cause additional CPU load. + + +## (Expert level) Disable vector field storage in the source field + +The `_source` field contains the original JSON document body that was passed at index time. This field is not indexed and is not searchable but is stored so that it can be returned when executing fetch requests such as `get` and `search`. When using vector fields within the source, you can remove the vector field to save disk space, as shown in the following example where the `location` vector is excluded: + +```json +PUT //_mappings +{ + "_source": { + "excludes": ["location"] + }, + "properties": { + "location": { + "type": "knn_vector", + "dimension": 2, + "space_type": "l2" + } + } +} +``` +{% include copy-curl.html %} + +Disabling the `_source` field can cause certain features to become unavailable, such as the `update`, `update_by_query`, and `reindex` APIs and the ability to debug queries or aggregations by using the original document at index time. + +In OpenSearch 2.15 or later, you can further improve indexing speed and reduce disk space by removing the vector field from the `_recovery_source`, as shown in the following example: + +```json +PUT //_mappings +{ + "_source": { + "excludes": ["location"], + "recovery_source_excludes": ["location"] + }, + "properties": { + "location": { + "type": "knn_vector", + "dimension": 2, + "space_type": "l2" + } + } +} +``` +{% include copy-curl.html %} + +This is an expert-level setting. Disabling the `_recovery_source` may lead to failures during peer-to-peer recovery. Before disabling the `_recovery_source`, check with your OpenSearch cluster admin to determine whether your cluster performs regular flushes before starting the peer-to-peer recovery of shards prior to disabling the `_recovery_source`. +{: .warning} + +## (Expert level) Build vector data structures on demand + +This approach is recommended only for workloads that involve a single initial bulk upload and will be used exclusively for search after force merging to a single segment. + +During indexing, vector search builds a specialized data structure for a `knn_vector` field to enable efficient approximate k-nearest neighbors (k-NN) search. However, these structures are rebuilt during [force merge]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) on vector indexes. To optimize indexing speed, follow these steps: + +1. **Disable vector data structure creation**: Disable vector data structure creation for new segments by setting [`index.knn.advanced.approximate_threshold`]({{site.url}}{{site.baseurl}}/vector-search/settings/#index-settings) to `-1`. + + To specify the setting at index creation, send the following request: + + ```json + PUT /test-index/ + { + "settings": { + "index.knn.advanced.approximate_threshold": "-1" + } + } + ``` + {% include copy-curl.html %} + + To specify the setting after index creation, send the following request: + + ```json + PUT /test-index/_settings + { + "index.knn.advanced.approximate_threshold": "-1" + } + ``` + {% include copy-curl.html %} + +1. **Perform bulk indexing**: Index data in [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) without performing any searches during ingestion: + + ```json + POST _bulk + { "index": { "_index": "test-index", "_id": "1" } } + { "my_vector1": [1.5, 2.5], "price": 12.2 } + { "index": { "_index": "test-index", "_id": "2" } } + { "my_vector1": [2.5, 3.5], "price": 7.1 } + ``` + {% include copy-curl.html %} + + If searches are performed while vector data structures are disabled, they will run using exact k-NN search. + +1. **Reenable vector data structure creation**: Once indexing is complete, enable vector data structure creation by setting `index.knn.advanced.approximate_threshold` to `0`: + + ```json + PUT /test-index/_settings + { + "index.knn.advanced.approximate_threshold": "0" + } + ``` + {% include copy-curl.html %} + + If you do not reset the setting to `0` before the force merge, you will need to reindex your data. + {: .note} + +1. **Force merge segments into one segment**: Perform a force merge and specify `max_num_segments=1` to create the vector data structures only once: + + ```json + POST test-index/_forcemerge?max_num_segments=1 + ``` + {% include copy-curl.html %} + + After the force merge, new search requests will execute approximate k-NN search using the newly created data structures. \ No newline at end of file diff --git a/_vector-search/performance-tuning-search.md b/_vector-search/performance-tuning-search.md new file mode 100644 index 0000000000..1e83a31bd0 --- /dev/null +++ b/_vector-search/performance-tuning-search.md @@ -0,0 +1,49 @@ +--- +layout: default +title: Search performance tuning +nav_order: 20 +parent: Performance tuning +--- + +# Search performance tuning + +Take the following steps to improve search performance. + +## Reduce segment count + +To improve search performance, you must keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results. + +Having one segment per shard provides optimal performance with respect to search latency. You can configure an index to have multiple shards in order to avoid very large shards and achieve more parallelism. + +You can control the number of segments by choosing a larger refresh interval or during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval. + +## Warm up the index + +Native library indexes are constructed during indexing, but they're loaded into memory during the first search. In Lucene, each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point). The top `size` results, ranked by score, are returned from all segment-level results within a shard (a higher score indicates a better result). + +Once a native library index is loaded (native library indexes are loaded outside of the OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and complete in a few seconds, while subsequent queries are faster and complete in milliseconds (assuming that the k-NN circuit breaker isn't triggered). + +To avoid this latency penalty during your first queries, you can use the warmup API operation on the indexes you want to search: + +```json +GET /_plugins/_knn/warmup/index1,index2,index3?pretty +{ + "_shards" : { + "total" : 6, + "successful" : 6, + "failed" : 0 + } +} +``` +{% include copy-curl.html %} + +The warmup API operation loads all native library indexes for all shards (primaries and replicas) for the specified indexes into the cache, so there's no penalty for loading native library indexes during initial searches. + +This API operation only loads the segments of active indexes into the cache. If a merge or refresh operation finishes after the API runs, or if you add new documents, you need to rerun the API to load those native library indexes into memory. +{: .warning} + + +## Avoid reading stored fields + +If your use case only involves reading the IDs and scores of the nearest neighbors, you can disable the reading of stored fields, which saves time that would otherwise be spent retrieving the vectors from stored fields. + diff --git a/_vector-search/performance-tuning.md b/_vector-search/performance-tuning.md new file mode 100644 index 0000000000..f4c04edb1c --- /dev/null +++ b/_vector-search/performance-tuning.md @@ -0,0 +1,37 @@ +--- +layout: default +title: Performance tuning +nav_order: 70 +has_children: true +redirect_from: + - /search-plugins/knn/performance-tuning/ +--- + +# Performance tuning + +This topic provides performance tuning recommendations for improving indexing and search performance for approximate k-NN (ANN) search. At a high level, k-NN works according to these principles: +* Vector indexes are created per `knn_vector` field/Lucene segment pair. +* Queries execute sequentially on segments in the shard (as with any other OpenSearch query). +* The coordinator node selects the final `size` neighbors from the neighbors returned by each shard. + +The following sections provide recommendations regarding comparing ANN to exact k-NN with a scoring script. + +## Recommendations for engines and cluster node sizing + +Each of the three engines used for ANN search has attributes that make it more sensible to use than the others in a given situation. Use the following information to help determine which engine will best meet your requirements. + +To optimize for indexing throughput, Faiss is a good option. For relatively smaller datasets (up to a few million vectors), the Lucene engine demonstrates better latencies and recall. At the same time, the size of the index is smallest compared to the other engines, which allows it to use smaller AWS instances for data nodes. For further considerations, see [Choosing the right method]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#choosing-the-right-method) and [Memory estimation]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#memory-estimation). + +When considering cluster node sizing, a general approach is to first establish an even distribution of the index across the cluster. However, there are other considerations. To help make these choices, you can refer to the OpenSearch managed service guidance in the [Sizing domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html) section. + +## Improving recall + +Recall depends on multiple factors, such as the number of vectors, dimensions, segments, and so on. Searching a large number of small segments and aggregating the results leads to better recall than searching a small number of large segments and aggregating the results. Larger native library indexes are more likely to lose recall if you're using smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. It's important to understand your system's requirements for latency and accuracy and then choose the number of segments based on experimentation. + +The default parameters work for a broader set of use cases, but make sure to run your own experiments on your datasets and choose the appropriate values. For index-level settings, see [Index settings]({{site.url}}{{site.baseurl}}/vector-search/settings/#index-settings). + +## ANN compared to scoring script + +The standard k-NN query and custom scoring options perform differently. Run tests with a representative set of documents to see if the search results and latencies match your expectations. + +Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing the shard count can improve latency, but be sure to keep the shard size within the [recommended guidelines]({{site.url}}{{site.baseurl}}/intro/#primary-and-replica-shards). \ No newline at end of file diff --git a/_vector-search/searching-data.md b/_vector-search/searching-data.md new file mode 100644 index 0000000000..8a821500b0 --- /dev/null +++ b/_vector-search/searching-data.md @@ -0,0 +1,75 @@ +--- +layout: default +title: Searching data +nav_order: 35 +--- + +# Searching vector data + +OpenSearch supports various methods for searching vector data, tailored to how the vectors were created and indexed. This guide explains the query syntax and options for raw vector search and auto-generated embedding search. + +## Search type comparison + +The following table compares the search syntax and typical use cases for each vector search method. + +| Feature | Query type | Input format | Model required | Use case | +|----------------------------------|------------------|------------------|---------------------|----------------------------| +| **Raw vectors** | [`knn`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) | Vector array | No | Raw vector search | +| **Auto-generated embeddings** | [`neural`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/) | Text or image data | Yes | [AI search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/) | + +## Searching raw vectors + +To search raw vectors, use the `knn` query type, provide the `vector` array as input, and specify the number of returned results `k`: + +```json +GET /my-raw-vector-index/_search +{ + "query": { + "knn": { + "my_vector": { + "vector": [0.1, 0.2, 0.3], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +## Searching auto-generated embeddings + +OpenSearch supports [AI-powered search methods]({{site.url}}{{site.baseurl}}/vector-search/ai-search/), including semantic, hybrid, multimodal, and conversational search with retrieval-augmented generation (RAG). These methods automatically generate embeddings from query input. + +To run an AI-powered search, use the `neural` query type. Specify the `query_text` input, the model ID of the embedding model you [configured in the ingest pipeline]({{site.url}}{{site.baseurl}}/vector-search/creating-vector-index/#converting-data-to-embeddings-during-ingestion), and the number of returned results `k`. To exclude embeddings from being returned in search results, specify the embedding field in the `_source.excludes` parameter: + +```json +GET /my-ai-search-index/_search +{ + "_source": { + "excludes": [ + "output_embedding" + ] + }, + "query": { + "neural": { + "output_embedding": { + "query_text": "What is AI search?", + "model_id": "mBGzipQB2gmRjlv_dOoB", + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +## Working with sparse vectors + +OpenSearch also supports sparse vectors. For more information, see [Neural sparse search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/neural-sparse-search/). + +## Next steps + +- [Getting started with semantic and hybrid search]({{site.url}}{{site.baseurl}}/vector-search/tutorials/neural-search-tutorial/) +- [Filtering data]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/) +- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/) +- [Neural query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural/) diff --git a/_vector-search/settings.md b/_vector-search/settings.md new file mode 100644 index 0000000000..9636e9a591 --- /dev/null +++ b/_vector-search/settings.md @@ -0,0 +1,50 @@ +--- +layout: default +title: Settings +nav_order: 90 +redirect_from: + - /search-plugins/knn/settings/ +--- + +# Vector search settings + +OpenSearch supports the following vector search settings. To learn more about static and dynamic settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). + +## Cluster settings + +The following table lists all available cluster-level vector search settings. For more information about cluster settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#updating-cluster-settings-using-the-api) and [Updating cluster settings using the API]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#updating-cluster-settings-using-the-api). + +Setting | Static/Dynamic | Default | Description +:--- | :--- | :--- | :--- +`knn.plugin.enabled`| Dynamic | `true` | Enables or disables the k-NN plugin. +`knn.algo_param.index_thread_qty` | Dynamic | `1` | The number of threads used for native library and Lucene library (for OpenSearch version 2.19 and later) index creation. Keeping this value low reduces the CPU impact of the k-NN plugin but also reduces indexing performance. +`knn.cache.item.expiry.enabled` | Dynamic | `false` | Whether to remove native library indexes from memory that have not been accessed in a specified period of time. +`knn.cache.item.expiry.minutes` | Dynamic | `3h` | If enabled, the amount of idle time before a native library index is removed from memory. +`knn.circuit_breaker.unset.percentage` | Dynamic | `75` | The native memory usage threshold for the circuit breaker. Memory usage must be lower than this percentage of `knn.memory.circuit_breaker.limit` in order for `knn.circuit_breaker.triggered` to remain `false`. +`knn.circuit_breaker.triggered` | Dynamic | `false` | `true` when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value. +`knn.memory.circuit_breaker.limit` | Dynamic | `50%` | The native memory limit for native library indexes. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, then the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, then the plugin removes the native library indexes used least recently. +`knn.memory.circuit_breaker.enabled` | Dynamic | `true` | Whether to enable the k-NN memory circuit breaker. +`knn.model.index.number_of_shards`| Dynamic | `1` | The number of shards to use for the model system index, which is the OpenSearch index that stores the models used for approximate nearest neighbor (ANN) search. +`knn.model.index.number_of_replicas`| Dynamic | `1` | The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this value should be at least 1 in order to increase stability. +`knn.model.cache.size.limit` | Dynamic | `10%` | The model cache limit cannot exceed 25% of the JVM heap. +`knn.faiss.avx2.disabled` | Static | `false` | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx2.so` library and load the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [Single Instruction Multiple Data (SIMD) optimization]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#simd-optimization). +`knn.faiss.avx512_spr.disabled` | Static | `false` | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx512_spr.so` library and load either the `libopensearchknn_faiss_avx512.so` , `libopensearchknn_faiss_avx2.so`, or the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). + +## Index settings + +The following table lists all available index-level k-NN settings. For information about updating these settings, see [Index-level index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#index-level-index-settings). + +Several parameters defined in the settings are currently in the deprecation process. Those parameters should be set in the mapping instead of in the index settings. Parameters set in the mapping will override the parameters set in the index settings. Setting the parameters in the mapping allows an index to have multiple `knn_vector` fields with different parameters. + +Setting | Static/Dynamic | Default | Description +:--- | :--- | :--- | :--- +`index.knn` | Static | `false` | Whether the index should build native library indexes for the `knn_vector` fields. If set to `false`, the `knn_vector` fields will be stored in doc values, but approximate k-NN search functionality will be disabled. +`index.knn.algo_param.ef_search` | Dynamic | `100` | `ef` (or `efSearch`) represents the size of the dynamic list for the nearest neighbors used during a search. Higher `ef` values lead to a more accurate but slower search. `ef` cannot be set to a value lower than the number of queried nearest neighbors, `k`. `ef` can take any value between `k` and the size of the dataset. +`index.knn.advanced.approximate_threshold` | Dynamic | `15000` | The number of vectors that a segment must have before creating specialized data structures for ANN search. Set to `-1` to disable building vector data structures and to `0` to always build them. +`index.knn.advanced.filtered_exact_search_threshold`| Dynamic | None | The filtered ID threshold value used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is lower than this setting's value, then exact search will be performed on the filtered IDs. +`index.knn.algo_param.ef_construction` | Static | `100` | Deprecated in 1.0.0. Use the [mapping parameters]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) to set this value instead. +`index.knn.algo_param.m` | Static | `16` | Deprecated in 1.0.0. Use the [mapping parameters]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) to set this value instead. +`index.knn.space_type` | Static | `l2` | Deprecated in 1.0.0. Use the [mapping parameters]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/) to set this value instead. + +An index created in OpenSearch version 2.11 or earlier will still use the previous `ef_construction` and `ef_search` values (`512`). +{: .note} \ No newline at end of file diff --git a/_vector-search/specialized-operations/index.md b/_vector-search/specialized-operations/index.md new file mode 100644 index 0000000000..4a97327fd4 --- /dev/null +++ b/_vector-search/specialized-operations/index.md @@ -0,0 +1,22 @@ +--- +layout: default +title: Specialized vector search +nav_order: 50 +has_children: true +has_toc: false +redirect_from: + - /vector-search/specialized-operations/ +cards: + - heading: "Nested field vector search" + description: "Use vector search to search nested fields" + link: "/vector-search/specialized-operations/nested-search-knn/" + - heading: "Radial search" + description: "Search all points in a vector space that reside within a specified maximum distance or minimum score threshold from a query point" + link: "/vector-search/specialized-operations/radial-search-knn/" +--- + +# Specialized vector search + +OpenSearch supports the following specialized vector search applications. + +{% include cards.html cards=page.cards %} \ No newline at end of file diff --git a/_search-plugins/knn/nested-search-knn.md b/_vector-search/specialized-operations/nested-search-knn.md similarity index 88% rename from _search-plugins/knn/nested-search-knn.md rename to _vector-search/specialized-operations/nested-search-knn.md index ba3df48bdf..f703b70323 100644 --- a/_search-plugins/knn/nested-search-knn.md +++ b/_vector-search/specialized-operations/nested-search-knn.md @@ -1,26 +1,28 @@ --- layout: default -title: k-NN search with nested fields -nav_order: 21 -parent: k-NN search +title: Nested field search +nav_order: 40 +parent: Specialized vector search has_children: false has_math: true +redirect_from: + - /search-plugins/knn/nested-search-knn/ --- -# k-NN search with nested fields +# Nested field search -Using [nested fields]({{site.url}}{{site.baseurl}}/field-types/nested/) in a k-nearest neighbors (k-NN) index, you can store multiple vectors in a single document. For example, if your document consists of various components, you can generate a vector value for each component and store each vector in a nested field. +Using [nested fields]({{site.url}}{{site.baseurl}}/field-types/nested/) in a vector index, you can store multiple vectors in a single document. For example, if your document consists of various components, you can generate a vector value for each component and store each vector in a nested field. -A k-NN document search operates at the field level. For a document with nested fields, OpenSearch examines only the vector nearest to the query vector to decide whether to include the document in the results. For example, consider an index containing documents `A` and `B`. Document `A` is represented by vectors `A1` and `A2`, and document `B` is represented by vector `B1`. Further, the similarity order for a query Q is `A1`, `A2`, `B1`. If you search using query Q with a k value of 2, the search will return both documents `A` and `B` instead of only document `A`. +A vector search operates at the field level. For a document with nested fields, OpenSearch examines only the vector nearest to the query vector to decide whether to include the document in the results. For example, consider an index containing documents `A` and `B`. Document `A` is represented by vectors `A1` and `A2`, and document `B` is represented by vector `B1`. Further, the similarity order for a query Q is `A1`, `A2`, `B1`. If you search using query Q with a k value of 2, the search will return both documents `A` and `B` instead of only document `A`. Note that in the case of an approximate search, the results are approximations and not exact matches. -k-NN search with nested fields is supported by the HNSW algorithm for the Lucene and Faiss engines. +Vector search with nested fields is supported by the HNSW algorithm for the Lucene and Faiss engines. ## Indexing and searching nested fields -To use k-NN search with nested fields, you must create a k-NN index by setting `index.knn` to `true`. Create a nested field by setting its `type` to `nested` and specify one or more fields of the `knn_vector` data type within the nested field. In this example, the `knn_vector` field `my_vector` is nested inside the `nested_field` field: +To use vector search with nested fields, you must create a vector index by setting `index.knn` to `true`. Create a nested field by setting its `type` to `nested` and specify one or more fields of the `knn_vector` data type within the nested field. In this example, the `knn_vector` field `my_vector` is nested inside the `nested_field` field: ```json PUT my-knn-index-1 @@ -71,7 +73,7 @@ PUT _bulk?refresh=true ``` {% include copy-curl.html %} -Then run a k-NN search on the data by using the `knn` query type: +Then run a vector search on the data by using the `knn` query type: ```json GET my-knn-index-1/_search @@ -475,13 +477,13 @@ The response contains all matching documents: } ``` -## k-NN search with filtering on nested fields +## Vector search with filtering on nested fields -You can apply a filter to a k-NN search with nested fields. A filter can be applied to either a top-level field or a field inside a nested field. +You can apply a filter to a vector search with nested fields. A filter can be applied to either a top-level field or a field inside a nested field. The following example applies a filter to a top-level field. -First, create a k-NN index with a nested field: +First, create a vector index with a nested field: ```json PUT my-knn-index-1 @@ -530,7 +532,7 @@ PUT _bulk?refresh=true ``` {% include copy-curl.html %} -Then run a k-NN search on the data using the `knn` query type with a filter. The following query returns documents whose `parking` field is set to `true`: +Then run a vector search on the data using the `knn` query type with a filter. The following query returns documents whose `parking` field is set to `true`: ```json GET my-knn-index-1/_search diff --git a/_search-plugins/knn/radial-search-knn.md b/_vector-search/specialized-operations/radial-search-knn.md similarity index 84% rename from _search-plugins/knn/radial-search-knn.md rename to _vector-search/specialized-operations/radial-search-knn.md index e5449a0993..6aecc44607 100644 --- a/_search-plugins/knn/radial-search-knn.md +++ b/_vector-search/specialized-operations/radial-search-knn.md @@ -1,36 +1,40 @@ --- layout: default title: Radial search -nav_order: 28 -parent: k-NN search +nav_order: 50 +parent: Specialized vector search has_children: false has_math: true +redirect_from: + - /search-plugins/knn/radial-search-knn/ --- # Radial search -Radial search enhances the k-NN plugin's capabilities beyond approximate top-`k` searches. With radial search, you can search all points within a vector space that reside within a specified maximum distance or minimum score threshold from a query point. This provides increased flexibility and utility in search operations. +Radial search enhances the vector search capabilities beyond approximate top-k searches. With radial search, you can search all points within a vector space that reside within a specified maximum distance or minimum score threshold from a query point. This provides increased flexibility and utility in search operations. -## Parameter type +## Parameters -`max_distance` allows users to specify a physical distance within the vector space, identifying all points that are within this distance from the query point. This approach is particularly useful for applications requiring spatial proximity or absolute distance measurements. +Radial search supports the following parameters: -`min_score` enables the specification of a similarity score, facilitating the retrieval of points that meet or exceed this score in relation to the query point. This method is ideal in scenarios where relative similarity, based on a specific metric, is more critical than physical proximity. +- `max_distance`: Specifies a physical distance within the vector space, identifying all points that are within this distance from the query point. This approach is particularly useful for applications requiring spatial proximity or absolute distance measurements. -Only one query variable, either `k`, `max_distance`, or `min_score`, is required to be specified during radial search. For more information about the vector spaces, see [Spaces](#spaces). +`min_score`: Specifies a similarity score, facilitating the retrieval of points that meet or exceed this score in relation to the query point. This method is ideal in scenarios where relative similarity, based on a specific metric, is more critical than physical proximity. + +Only one query variable, either `k`, `max_distance`, or `min_score`, is required to be specified during radial search. ## Supported cases -You can perform radial search with either the Lucene or Faiss engines. The following table summarizes radial search use cases by engine. +You can perform radial search with either Lucene or Faiss engine. The following table summarizes radial search use cases by engine. | Engine supported | Filter supported | Nested field supported | Search type | | :--- | :--- | :--- | :--- | -| Lucene | true | false | approximate | -| Faiss | true | true | approximate | +| Lucene | Yes | No | Approximate | +| Faiss | Yes | Yes | Approximate | ## Spaces -For supported spaces, see [Spaces]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#spaces). +For supported spaces, see [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/). ## Examples @@ -38,7 +42,7 @@ The following examples can help you to get started with radial search. ### Prerequisites -To use a k-NN index with radial search, create a k-NN index by setting `index.knn` to `true`. Specify one or more fields of the `knn_vector` data type, as shown in the following example: +To use a vector index with radial search, create a vector index by setting `index.knn` to `true`. Specify one or more fields of the `knn_vector` data type, as shown in the following example: ```json PUT knn-index-test diff --git a/_ml-commons-plugin/tutorials/chatbots/build-chatbot.md b/_vector-search/tutorials/chatbots/build-chatbot.md similarity index 100% rename from _ml-commons-plugin/tutorials/chatbots/build-chatbot.md rename to _vector-search/tutorials/chatbots/build-chatbot.md diff --git a/_vector-search/tutorials/chatbots/index.md b/_vector-search/tutorials/chatbots/index.md new file mode 100644 index 0000000000..1c39a6b710 --- /dev/null +++ b/_vector-search/tutorials/chatbots/index.md @@ -0,0 +1,36 @@ +--- +layout: default +title: Chatbots and agents +parent: Tutorials +has_children: true +has_toc: false +nav_order: 140 +redirect_from: + - /vector-search/tutorials/chatbots/ +chatbots: + - heading: RAG chatbot + link: vector-search/tutorials/chatbots/rag-chatbot/ + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" + - heading: RAG with a conversational flow agent + link: /vector-search/tutorials/chatbots/rag-conversational-agent/ + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" + - heading: Build your own chatbot + link: /vector-search/tutorials/chatbots/build-chatbot/ + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" +--- + +# Chatbots and agents tutorials + +The following machine learning (ML) tutorials show you how to implement chatbots and agents. + +{% include cards.html cards=page.chatbots %} + \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/chatbots/rag-chatbot.md b/_vector-search/tutorials/chatbots/rag-chatbot.md similarity index 100% rename from _ml-commons-plugin/tutorials/chatbots/rag-chatbot.md rename to _vector-search/tutorials/chatbots/rag-chatbot.md diff --git a/_ml-commons-plugin/tutorials/chatbots/rag-conversational-agent.md b/_vector-search/tutorials/chatbots/rag-conversational-agent.md similarity index 99% rename from _ml-commons-plugin/tutorials/chatbots/rag-conversational-agent.md rename to _vector-search/tutorials/chatbots/rag-conversational-agent.md index f151b492d5..f746262b81 100644 --- a/_ml-commons-plugin/tutorials/chatbots/rag-conversational-agent.md +++ b/_vector-search/tutorials/chatbots/rag-conversational-agent.md @@ -136,7 +136,7 @@ PUT test_population_data ``` {% include copy-curl.html %} -For more information about vector indexes, see [vector index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). +For more information about vector indexes, see [Creating a vector index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). ### Step 1.4: Ingest data diff --git a/_ml-commons-plugin/tutorials/conversational-search/conversational-search-cohere.md b/_vector-search/tutorials/conversational-search/conversational-search-cohere.md similarity index 100% rename from _ml-commons-plugin/tutorials/conversational-search/conversational-search-cohere.md rename to _vector-search/tutorials/conversational-search/conversational-search-cohere.md diff --git a/_vector-search/tutorials/conversational-search/index.md b/_vector-search/tutorials/conversational-search/index.md new file mode 100644 index 0000000000..4ed9731a8a --- /dev/null +++ b/_vector-search/tutorials/conversational-search/index.md @@ -0,0 +1,23 @@ +--- +layout: default +title: Conversational search +parent: Tutorials +has_children: true +has_toc: false +nav_order: 80 +redirect_from: + - /vector-search/tutorials/conversational-search/ +conversational_search: + - heading: Conversational search using Cohere Command + link: /vector-search/tutorials/conversational-search/conversational-search-cohere/ + list: + - "Platform: OpenSearch" + - "Model: Cohere Command" + - "Deployment: Provider API" +--- + +# Conversational search tutorials + +The following tutorials show you how to implement conversational search. + +{% include cards.html cards=page.conversational_search %} \ No newline at end of file diff --git a/_vector-search/tutorials/index.md b/_vector-search/tutorials/index.md new file mode 100644 index 0000000000..c419bb55b8 --- /dev/null +++ b/_vector-search/tutorials/index.md @@ -0,0 +1,243 @@ +--- +layout: default +title: Tutorials +has_children: true +has_toc: false +nav_order: 47 +redirect_from: + - /vector-search/tutorials/ + - /ml-commons-plugin/tutorials/ + - /ml-commons-plugin/tutorials/index/ +vector_search_101: + - heading: "Getting started with vector search" + link: "/vector-search/getting-started/" + - heading: "Getting started with semantic and hybrid search" + link: "/vector-search/tutorials/neural-search-tutorial/" +vector_operations: + - heading: "Generating embeddings from arrays of objects" + list: + - "Platform: OpenSearch" + - "Model: Amazon Titan" + - "Deployment: Amazon Bedrock" + link: "/vector-search/tutorials/vector-operations/generate-embeddings/" + - heading: "Semantic search using byte-quantized vectors" + list: + - "Platform: OpenSearch" + - "Model: Cohere Embed" + - "Deployment: Provider API" + link: "/vector-search/tutorials/vector-operations/semantic-search-byte-vectors/" +semantic_search: + - heading: "Semantic search using the OpenAI embedding model" + link: "/vector-search/tutorials/semantic-search/semantic-search-openai/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: OpenAI embedding" + - "Deployment: Provider API" + - heading: "Semantic search using Cohere Embed" + link: "/vector-search/tutorials/semantic-search/semantic-search-cohere/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Cohere Embed" + - "Deployment: Provider API" + - heading: "Semantic search using Cohere Embed on Amazon Bedrock" + link: "/vector-search/tutorials/semantic-search/semantic-search-bedrock-cohere/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Cohere Embed" + - "Deployment: Amazon Bedrock" + - heading: Semantic search using Amazon Bedrock Titan + link: "/vector-search/tutorials/semantic-search/semantic-search-bedrock-titan/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Amazon Titan" + - "Deployment: Amazon Bedrock" + - heading: "Semantic search using Amazon Bedrock Titan in another account" + link: /vector-search/tutorials/semantic-search/semantic-search-bedrock-titan-other/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Amazon Titan" + - "Deployment: Amazon Bedrock" + - heading: Semantic search using a model in Amazon SageMaker + link: /vector-search/tutorials/semantic-search/semantic-search-sagemaker/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Custom" + - "Deployment: Amazon SageMaker" + - heading: Semantic search using AWS CloudFormation and Amazon SageMaker + link: /vector-search/tutorials/semantic-search/semantic-search-cfn-sagemaker/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Custom" + - "Deployment: Amazon SageMaker + CloudFormation" +conversational_search: + - heading: Conversational search using Cohere Command + link: /vector-search/tutorials/conversational-search/conversational-search-cohere/ + list: + - "Platform: OpenSearch" + - "Model: Cohere Command" + - "Deployment: Provider API" +reranking: + - heading: Reranking search results using Cohere Rerank + link: /vector-search/tutorials/reranking/reranking-cohere/ + list: + - "Platform: OpenSearch" + - "Model: Cohere Rerank" + - "Deployment: Provider API" + - heading: Reranking search results using Amazon Bedrock models + link: /vector-search/tutorials/reranking/reranking-bedrock/ + list: + - "Platform: OpenSearch" + - "Model: Amazon Bedrock reranker models" + - "Deployment: Amazon Bedrock" + - heading: Reranking search results using a cross-encoder in Amazon SageMaker + link: /vector-search/tutorials/reranking/reranking-cross-encoder/ + list: + - "Platform: OpenSearch" + - "Model: Hugging Face MS MARCO" + - "Deployment: Amazon SageMaker" +rag: + - heading: Retrieval-augmented generation (RAG) using the DeepSeek Chat API + link: /vector-search/tutorials/rag/rag-deepseek-chat/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: DeepSeek Chat" + - 'Deployment: Provider API' + - heading: RAG using DeepSeek-R1 on Amazon Bedrock + link: /vector-search/tutorials/rag/rag-deepseek-r1-bedrock/ + list: + - 'Platform: OpenSearch, Amazon OpenSearch Service' + - 'Model: DeepSeek-R1' + - "Deployment: Amazon Bedrock" + - heading: RAG using DeepSeek-R1 in Amazon SageMaker + link: /vector-search/tutorials/rag/rag-deepseek-r1-sagemaker/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: DeepSeek-R1" + - "Deployment: Amazon SageMaker" +chatbots: + - heading: RAG chatbot + link: vector-search/tutorials/chatbots/rag-chatbot/ + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" + - heading: RAG with a conversational flow agent + link: /vector-search/tutorials/chatbots/rag-conversational-agent/ + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" + - heading: Build your own chatbot + link: /vector-search/tutorials/chatbots/build-chatbot/ + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" +model_controls: + - heading: "Amazon Bedrock guardrails" + link: "/vector-search/tutorials/model-controls/bedrock-guardrails/" + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" +--- + +# Tutorials + +Using the OpenSearch machine learning (ML) framework, you can build various applications, from implementing semantic search to building your own chatbot. To learn more, explore the following ML tutorials. + +
+ + Vector search 101 + + {: .heading} + +{% include cards.html cards=page.vector_search_101 %} + +
+ +--- + +

+ +
+ + Vector operations + + {: .heading} + +{% include cards.html cards=page.vector_operations %} + +
+ +--- + +
+ + Semantic search + + {: .heading} + +{% include cards.html cards=page.semantic_search %} + +
+ +--- + +
+ + Conversational search + + {: .heading} + +{% include cards.html cards=page.conversational_search %} + +
+ +--- + +
+ + Search result reranking + + {: .heading} + +{% include cards.html cards=page.reranking %} + +
+ +--- + +
+ + RAG + + {: .heading} + +{% include cards.html cards=page.rag %} + +
+ +--- + +
+ + Chatbots and agents + + {: .heading} + +{% include cards.html cards=page.chatbots %} + +
+ +--- + +
+ + Model controls + + {: .heading} + +{% include cards.html cards=page.model_controls %} + +
diff --git a/_ml-commons-plugin/tutorials/model-controls/bedrock-guardrails.md b/_vector-search/tutorials/model-controls/bedrock-guardrails.md similarity index 100% rename from _ml-commons-plugin/tutorials/model-controls/bedrock-guardrails.md rename to _vector-search/tutorials/model-controls/bedrock-guardrails.md diff --git a/_vector-search/tutorials/model-controls/index.md b/_vector-search/tutorials/model-controls/index.md new file mode 100644 index 0000000000..ef424ee222 --- /dev/null +++ b/_vector-search/tutorials/model-controls/index.md @@ -0,0 +1,23 @@ +--- +layout: default +title: Model controls +parent: Tutorials +has_children: true +has_toc: false +nav_order: 80 +redirect_from: + - /vector-search/tutorials/model-controls/ +model_controls: + - heading: Amazon Bedrock guardrails + link: /vector-search/tutorials/model-controls/bedrock-guardrails/ + list: + - "Platform: OpenSearch" + - "Model: Anthropic Claude" + - "Deployment: Amazon Bedrock" +--- + +# Model controls tutorials + +The following tutorials show you how to implement model controls. + +{% include cards.html cards=page.model_controls %} \ No newline at end of file diff --git a/_search-plugins/neural-search-tutorial.md b/_vector-search/tutorials/neural-search-tutorial.md similarity index 69% rename from _search-plugins/neural-search-tutorial.md rename to _vector-search/tutorials/neural-search-tutorial.md index 9c1b224cb8..5f5eac48aa 100644 --- a/_search-plugins/neural-search-tutorial.md +++ b/_vector-search/tutorials/neural-search-tutorial.md @@ -1,46 +1,40 @@ --- layout: default -title: Neural search tutorial +title: Getting started with semantic and hybrid search has_children: false -nav_order: 30 +parent: Tutorials +grand_parent: Getting started +nav_order: 3 redirect_from: - /ml-commons-plugin/semantic-search/ + - /search-plugins/neural-search-tutorial/ +steps: + - heading: "Choose a model for embedding generation" + link: "/vector-search/tutorials/neural-search-tutorial/#step-1-choose-a-model" + - heading: "Register and deploy the model" + link: "/vector-search/tutorials/neural-search-tutorial/#step-2-register-and-deploy-the-model" + - heading: "Ingest data" + link: "/vector-search/tutorials/neural-search-tutorial/#step-3-ingest-data" + - heading: "Search the data" + link: "/vector-search/tutorials/neural-search-tutorial/#step-4-search-the-data" --- -# Neural search tutorial +# Getting started with semantic and hybrid search By default, OpenSearch calculates document scores using the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) algorithm. BM25 is a keyword-based algorithm that performs well on queries containing keywords but fails to capture the semantic meaning of the query terms. Semantic search, unlike keyword-based search, takes into account the meaning of the query in the search context. Thus, semantic search performs well when a query requires natural language understanding. -In this tutorial, you'll learn how to use neural search to: +In this tutorial, you'll learn how to implement the following types of search: -- Implement semantic search in OpenSearch. -- Implement hybrid search by combining semantic and keyword search to improve search relevance. - -## Terminology - -It's helpful to understand the following terms before starting this tutorial: - -- _Neural search_: Facilitates vector search at ingestion time and at search time: - - At ingestion time, neural search uses language models to generate vector embeddings from the text fields in the document. The documents containing both the original text field and the vector embedding of the field are then indexed in a k-NN index, as shown in the following diagram. - - ![Neural search at ingestion time diagram]({{site.url}}{{site.baseurl}}/images/neural-search-ingestion.png) - - At search time, when you then use a _neural query_, the query text is passed through a language model, and the resulting vector embeddings are compared with the document text vector embeddings to find the most relevant results, as shown in the following diagram. - - ![Neural search at search time diagram]({{site.url}}{{site.baseurl}}/images/neural-search-query.png) - -- _Semantic search_: Employs neural search in order to determine the intention of the user's query in the search context, thereby improving search relevance. - -- _Hybrid search_: Combines semantic and keyword search to improve search relevance. +- **Semantic search**: Considers semantic meaning in order to determine the intention of the user's query in the search context, thereby improving search relevance. +- **Hybrid search**: Combines semantic and keyword search to improve search relevance. ## OpenSearch components for semantic search -In this tutorial, you'll implement semantic search using the following OpenSearch components: +In this tutorial, you'll use the following OpenSearch components: -- [Model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#model-groups) - [Pretrained language models provided by OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) - [Ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) - [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) -- [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) - [Search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) - [Normalization processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) - [Hybrid query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/) @@ -76,41 +70,27 @@ For more information about ML-related cluster settings, see [ML Commons cluster This tutorial consists of the following steps: -1. [**Set up an ML language model**](#step-1-set-up-an-ml-language-model). - 1. [Choose a language model](#step-1a-choose-a-language-model). - 1. [Register a model group](#step-1b-register-a-model-group). - 1. [Register the model to the model group](#step-1c-register-the-model-to-the-model-group). - 1. [Deploy the model](#step-1d-deploy-the-model). -1. [**Ingest data with neural search**](#step-2-ingest-data-with-neural-search). - 1. [Create an ingest pipeline for neural search](#step-2a-create-an-ingest-pipeline-for-neural-search). - 1. [Create a k-NN index](#step-2b-create-a-k-nn-index). - 1. [Ingest documents into the index](#step-2c-ingest-documents-into-the-index). -1. [**Search the data**](#step-3-search-the-data). - - [Search using a keyword search](#search-using-a-keyword-search). - - [Search using a neural search](#search-using-a-neural-search). - - [Search using a hybrid search](#search-using-a-hybrid-search). - -Some steps in the tutorial contain optional `Test it` sections. You can ensure that the step was successful by running requests in these sections. +{% include list.html list_items=page.steps%} + +Some steps in the tutorial contain optional `Test it` sections. You can ensure that the step completed successfully by running requests in these sections. After you're done, follow the steps in the [Clean up](#clean-up) section to delete all created components. ## Tutorial -You can follow this tutorial using your command line or the OpenSearch Dashboards [Dev Tools console]({{site.url}}{{site.baseurl}}/dashboards/dev-tools/run-queries/). - -## Step 1: Set up an ML language model +You can follow this tutorial by using your command line or the OpenSearch Dashboards [Dev Tools console]({{site.url}}{{site.baseurl}}/dashboards/dev-tools/run-queries/). -Neural search requires a language model in order to generate vector embeddings from text fields, both at ingestion time and query time. +## Step 1: Choose a model -### Step 1(a): Choose a language model +First, you'll need to choose a language model in order to generate vector embeddings from text fields, both at ingestion time and query time. -For this tutorial, you'll use the [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert) model from Hugging Face. It is one of the pretrained sentence transformer models available in OpenSearch that has shown some of the best results in benchmarking tests (for details, see [this blog post](https://opensearch.org/blog/semantic-science-benchmarks/)). You'll need the name, version, and dimension of the model to register it. You can find this information in the [pretrained model table]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) by selecting the `config_url` link corresponding to the model's TorchScript artifact: +For this tutorial, you'll use the [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert) model from Hugging Face. It is one of the pretrained sentence transformer models available in OpenSearch that has shown some of the best results in benchmarking tests (for more information, see [this blog post](https://opensearch.org/blog/semantic-science-benchmarks/)). You'll need the name, version, and dimension of the model to register it. You can find this information in the [pretrained model table]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) by selecting the `config_url` link corresponding to the model's TorchScript artifact: - The model name is `huggingface/sentence-transformers/msmarco-distilbert-base-tas-b`. - The model version is `1.0.1`. - The number of dimensions for this model is `768`. -Take note of the dimensionality of the model because you'll need it when you set up a k-NN index. +Take note of the dimensionality of the model because you'll need it when you set up a vector index. {: .important} #### Advanced: Using a different model @@ -125,108 +105,15 @@ Alternatively, you can choose one of the following options for your model: For information about choosing a model, see [Further reading](#further-reading). -### Step 1(b): Register a model group - -For access control, models are organized into model groups (collections of versions of a particular model). Each model group name in the cluster must be globally unique. Registering a model group ensures the uniqueness of the model group name. - -If you are registering the first version of a model without first registering the model group, a new model group is created automatically. For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). -{: .tip} - -To register a model group with the access mode set to `public`, send the following request: - -```json -POST /_plugins/_ml/model_groups/_register -{ - "name": "NLP_model_group", - "description": "A model group for NLP models", - "access_mode": "public" -} -``` -{% include copy-curl.html %} - -OpenSearch sends back the model group ID: - -```json -{ - "model_group_id": "Z1eQf4oB5Vm0Tdw8EIP2", - "status": "CREATED" -} -``` - -You'll use this ID to register the chosen model to the model group. - -
- - Test it - - {: .text-delta} - -Search for the newly created model group by providing its model group ID in the request: - -```json -POST /_plugins/_ml/model_groups/_search -{ - "query": { - "match": { - "_id": "Z1eQf4oB5Vm0Tdw8EIP2" - } - } -} -``` -{% include copy-curl.html %} - -The response contains the model group: - -```json -{ - "took": 0, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": ".plugins-ml-model-group", - "_id": "Z1eQf4oB5Vm0Tdw8EIP2", - "_version": 1, - "_seq_no": 14, - "_primary_term": 2, - "_score": 1, - "_source": { - "created_time": 1694357262582, - "access": "public", - "latest_version": 0, - "last_updated_time": 1694357262582, - "name": "NLP_model_group", - "description": "A model group for NLP models" - } - } - ] - } -} -``` -
- - -### Step 1(c): Register the model to the model group +## Step 2: Register and deploy the model -To register the model to the model group, provide the model group ID in the register request: +To register the model, provide the model group ID in the register request: ```json POST /_plugins/_ml/models/_register { "name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b", "version": "1.0.1", - "model_group_id": "Z1eQf4oB5Vm0Tdw8EIP2", "model_format": "TORCH_SCRIPT" } ``` @@ -248,7 +135,9 @@ GET /_plugins/_ml/tasks/aFeif4oB5Vm0Tdw8yoN7 ``` {% include copy-curl.html %} -Once the task is complete, the task state will be `COMPLETED` and the Tasks API response will contain a model ID for the registered model: +OpenSearch saves the registered model in the model index. Deploying a model creates a model instance and caches the model in memory. + +Once the task is complete, the task state will be `COMPLETED` and the Tasks API response will contain a model ID for the deployed model: ```json { @@ -338,55 +227,12 @@ POST /_plugins/_ml/models/_register "all_config": "{\"_name_or_path\":\"old_models/msmarco-distilbert-base-tas-b/0_Transformer\",\"activation\":\"gelu\",\"architectures\":[\"DistilBertModel\"],\"attention_dropout\":0.1,\"dim\":768,\"dropout\":0.1,\"hidden_dim\":3072,\"initializer_range\":0.02,\"max_position_embeddings\":512,\"model_type\":\"distilbert\",\"n_heads\":12,\"n_layers\":6,\"pad_token_id\":0,\"qa_dropout\":0.1,\"seq_classif_dropout\":0.2,\"sinusoidal_pos_embds\":false,\"tie_weights_\":true,\"transformers_version\":\"4.7.0\",\"vocab_size\":30522}" }, "created_time": 1676074079195, - "model_group_id": "Z1eQf4oB5Vm0Tdw8EIP2", "url": "https://artifacts.opensearch.org/models/ml-models/huggingface/sentence-transformers/msmarco-distilbert-base-tas-b/1.0.1/onnx/sentence-transformers_msmarco-distilbert-base-tas-b-1.0.1-onnx.zip" } ``` For more information, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/). -### Step 1(d): Deploy the model - -Once the model is registered, it is saved in the model index. Next, you'll need to deploy the model. Deploying a model creates a model instance and caches the model in memory. To deploy the model, provide its model ID to the `_deploy` endpoint: - -```json -POST /_plugins/_ml/models/aVeif4oB5Vm0Tdw8zYO2/_deploy -``` -{% include copy-curl.html %} - -Like the register operation, the deploy operation is asynchronous, so you'll get a task ID in the response: - -```json -{ - "task_id": "ale6f4oB5Vm0Tdw8NINO", - "status": "CREATED" -} -``` - -You can check the status of the task by using the Tasks API: - -```json -GET /_plugins/_ml/tasks/ale6f4oB5Vm0Tdw8NINO -``` -{% include copy-curl.html %} - -Once the task is complete, the task state will be `COMPLETED`: - -```json -{ - "model_id": "aVeif4oB5Vm0Tdw8zYO2", - "task_type": "DEPLOY_MODEL", - "function_name": "TEXT_EMBEDDING", - "state": "COMPLETED", - "worker_node": [ - "4p6FVOmJRtu3wehDD74hzQ" - ], - "create_time": 1694360024141, - "last_update_time": 1694360027940, - "is_async": true -} -``` -
Test it @@ -439,13 +285,13 @@ GET /_plugins/_ml/profile/models ```
-## Step 2: Ingest data with neural search +## Step 3: Ingest data -Neural search uses a language model to transform text into vector embeddings. During ingestion, neural search creates vector embeddings for the text fields in the request. During search, you can generate vector embeddings for the query text by applying the same model, allowing you to perform vector similarity search on the documents. +OpenSearch uses a language model to transform text into vector embeddings. During ingestion, OpenSearch creates vector embeddings for the text fields in the request. During search, you can generate vector embeddings for the query text by applying the same model, allowing you to perform vector similarity search on the documents. -### Step 2(a): Create an ingest pipeline for neural search +### Step 3(a): Create an ingest pipeline -Now that you have deployed a model, you can use this model to configure [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/). First, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains one processor: a task that transforms document fields before documents are ingested into an index. For neural search, you'll set up a `text_embedding` processor that creates vector embeddings from text. You'll need the `model_id` of the model you set up in the previous section and a `field_map`, which specifies the name of the field from which to take the text (`text`) and the name of the field in which to record embeddings (`passage_embedding`): +Now that you have deployed a model, you can use this model to configure an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains one processor: a task that transforms document fields before documents are ingested into an index. In this example, you'll set up a `text_embedding` processor that creates vector embeddings from text. You'll need the `model_id` of the model you set up in the previous section and a `field_map`, which specifies the name of the field from which to take the text (`text`) and the name of the field in which to record embeddings (`passage_embedding`): ```json PUT /_ingest/pipeline/nlp-ingest-pipeline @@ -499,9 +345,9 @@ The response contains the ingest pipeline: ``` -### Step 2(b): Create a k-NN index +### Step 3(b): Create a vector index -Now you'll create a k-NN index with a field named `text`, which contains an image description, and a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field named `passage_embedding`, which contains the vector embedding of the text. Additionally, set the default ingest pipeline to the `nlp-ingest-pipeline` you created in the previous step: +Now you'll create a vector index with a field named `text`, which contains an image description, and a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field named `passage_embedding`, which contains the vector embedding of the text. Additionally, set the default ingest pipeline to the `nlp-ingest-pipeline` you created in the previous step: ```json @@ -519,12 +365,7 @@ PUT /my-nlp-index "passage_embedding": { "type": "knn_vector", "dimension": 768, - "method": { - "engine": "lucene", - "space_type": "l2", - "name": "hnsw", - "parameters": {} - } + "space_type": "l2" }, "text": { "type": "text" @@ -535,7 +376,7 @@ PUT /my-nlp-index ``` {% include copy-curl.html %} -Setting up a k-NN index allows you to later perform a vector search on the `passage_embedding` field. +Setting up a vector index allows you to later perform a vector search on the `passage_embedding` field.
@@ -543,7 +384,7 @@ Setting up a k-NN index allows you to later perform a vector search on the `pass {: .text-delta} -Use the following requests to get the settings and the mappings of the created index: +Use the following requests to get the settings and mappings of the created index: ```json GET /my-nlp-index/_settings @@ -557,7 +398,7 @@ GET /my-nlp-index/_mappings
-### Step 2(c): Ingest documents into the index +### Step 3(c): Ingest documents into the index In this step, you'll ingest several sample documents into the index. The sample data is taken from the [Flickr image dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset). Each document contains a `text` field corresponding to the image description and an `id` field corresponding to the image ID: @@ -637,9 +478,9 @@ The response includes the document `_source` containing the original `text` and } ``` -## Step 3: Search the data +## Step 4: Search the data -Now you'll search the index using keyword search, neural search, and a combination of the two. +Now you'll search the index using a keyword search, a semantic search, and a combination of the two. ### Search using a keyword search @@ -664,7 +505,7 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} -Document 3 is not returned because it does not contain the specified keywords. Documents containing the words `rodeo` and `cowboy` are scored lower because semantic meaning is not considered: +Document 3 is not returned because it does not contain the specified keywords. Documents containing the words `rodeo` and `cowboy` are scored lower because their semantic meaning is not considered:
@@ -731,9 +572,9 @@ Document 3 is not returned because it does not contain the specified keywords. D ```
-### Search using a neural search +### Search using a semantic search -To search using a neural search, use a `neural` query and provide the model ID of the model you set up earlier so that vector embeddings for the query text are generated with the model used at ingestion time: +To search using a semantic search, use a `neural` query and provide the model ID of the model you set up earlier so that vector embeddings for the query text are generated with the model used at ingestion time: ```json GET /my-nlp-index/_search @@ -756,7 +597,7 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} -This time, the response not only contains all five documents, but the document order is also improved because neural search considers semantic meaning: +This time, the response not only contains all five documents, but the document order is also improved because semantic search considers semantic meaning:
@@ -834,7 +675,7 @@ This time, the response not only contains all five documents, but the document o ### Search using a hybrid search -Hybrid search combines keyword and neural search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline you'll configure intercepts search results at an intermediate stage and applies the [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) to them. The `normalization-processor` normalizes and combines the document scores from multiple query clauses, rescoring the documents according to the chosen normalization and combination techniques. +Hybrid search combines keyword and semantic search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline you'll configure intercepts search results at an intermediate stage and applies the [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) to them. The `normalization-processor` normalizes and combines the document scores from multiple query clauses, rescoring the documents according to the chosen normalization and combination techniques. #### Step 1: Configure a search pipeline @@ -1033,4 +874,4 @@ DELETE /_plugins/_ml/model_groups/Z1eQf4oB5Vm0Tdw8EIP2 ## Further reading - Read about the basics of OpenSearch semantic search in [Building a semantic search engine in OpenSearch](https://opensearch.org/blog/semantic-search-solutions/). -- Read about the benefits of combining keyword and neural search, the normalization and combination technique options, and benchmarking tests in [The ABCs of semantic search in OpenSearch: Architectures, benchmarks, and combination strategies](https://opensearch.org/blog/semantic-science-benchmarks/). +- Read about the combining keyword and semantic search, the normalization and combination technique options, and benchmarking tests in [The ABCs of semantic search in OpenSearch: Architectures, benchmarks, and combination strategies](https://opensearch.org/blog/semantic-science-benchmarks/). \ No newline at end of file diff --git a/_vector-search/tutorials/rag/index.md b/_vector-search/tutorials/rag/index.md new file mode 100644 index 0000000000..41286d7aaa --- /dev/null +++ b/_vector-search/tutorials/rag/index.md @@ -0,0 +1,35 @@ +--- +layout: default +title: RAG +parent: Tutorials +has_children: true +has_toc: false +nav_order: 120 +redirect_from: + - /vector-search/tutorials/rag/ +rag: + - heading: Retrieval-augmented generation (RAG) using the DeepSeek Chat API + link: /vector-search/tutorials/rag/rag-deepseek-chat/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: DeepSeek Chat" + - 'Deployment: Provider API' + - heading: RAG using DeepSeek-R1 on Amazon Bedrock + link: /vector-search/tutorials/rag/rag-deepseek-r1-bedrock/ + list: + - 'Platform: OpenSearch, Amazon OpenSearch Service' + - 'Model: DeepSeek-R1' + - "Deployment: Amazon Bedrock" + - heading: RAG using DeepSeek-R1 in Amazon SageMaker + link: /vector-search/tutorials/rag/rag-deepseek-r1-sagemaker/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: DeepSeek-R1" + - "Deployment: Amazon SageMaker" +--- + +# RAG tutorials + +The following machine learning (ML) tutorials show you how to implement retrieval-augmeted generation (RAG). + +{% include cards.html cards=page.rag %} \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/rag/rag-deepseek-chat.md b/_vector-search/tutorials/rag/rag-deepseek-chat.md similarity index 100% rename from _ml-commons-plugin/tutorials/rag/rag-deepseek-chat.md rename to _vector-search/tutorials/rag/rag-deepseek-chat.md diff --git a/_ml-commons-plugin/tutorials/rag/rag-deepseek-r1-bedrock.md b/_vector-search/tutorials/rag/rag-deepseek-r1-bedrock.md similarity index 100% rename from _ml-commons-plugin/tutorials/rag/rag-deepseek-r1-bedrock.md rename to _vector-search/tutorials/rag/rag-deepseek-r1-bedrock.md diff --git a/_ml-commons-plugin/tutorials/rag/rag-deepseek-r1-sagemaker.md b/_vector-search/tutorials/rag/rag-deepseek-r1-sagemaker.md similarity index 100% rename from _ml-commons-plugin/tutorials/rag/rag-deepseek-r1-sagemaker.md rename to _vector-search/tutorials/rag/rag-deepseek-r1-sagemaker.md diff --git a/_vector-search/tutorials/reranking/index.md b/_vector-search/tutorials/reranking/index.md new file mode 100644 index 0000000000..552724b08e --- /dev/null +++ b/_vector-search/tutorials/reranking/index.md @@ -0,0 +1,35 @@ +--- +layout: default +title: Reranking search results +parent: Tutorials +has_children: true +has_toc: false +nav_order: 100 +redirect_from: + - /vector-search/tutorials/reranking/ +reranking: + - heading: Reranking search results using Cohere Rerank + link: /vector-search/tutorials/reranking/reranking-cohere/ + list: + - "Platform: OpenSearch" + - "Model: Cohere Rerank" + - "Deployment: Provider API" + - heading: Reranking search results using Amazon Bedrock models + link: /vector-search/tutorials/reranking/reranking-bedrock/ + list: + - "Platform: OpenSearch" + - "Model: Amazon Bedrock reranker models" + - "Deployment: Amazon Bedrock" + - heading: Reranking search results using a cross-encoder in Amazon SageMaker + link: /vector-search/tutorials/reranking/reranking-cross-encoder/ + list: + - "Platform: OpenSearch" + - "Model: Hugging Face MS MARCO" + - "Deployment: Amazon SageMaker" +--- + +# Reranking search results tutorials + +The following machine learning (ML) tutorials show you how to implement search result reranking. + +{% include cards.html cards=page.reranking %} \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/reranking/reranking-bedrock.md b/_vector-search/tutorials/reranking/reranking-bedrock.md similarity index 100% rename from _ml-commons-plugin/tutorials/reranking/reranking-bedrock.md rename to _vector-search/tutorials/reranking/reranking-bedrock.md diff --git a/_ml-commons-plugin/tutorials/reranking/reranking-cohere.md b/_vector-search/tutorials/reranking/reranking-cohere.md similarity index 100% rename from _ml-commons-plugin/tutorials/reranking/reranking-cohere.md rename to _vector-search/tutorials/reranking/reranking-cohere.md diff --git a/_ml-commons-plugin/tutorials/reranking/reranking-cross-encoder.md b/_vector-search/tutorials/reranking/reranking-cross-encoder.md similarity index 100% rename from _ml-commons-plugin/tutorials/reranking/reranking-cross-encoder.md rename to _vector-search/tutorials/reranking/reranking-cross-encoder.md diff --git a/_vector-search/tutorials/semantic-search/index.md b/_vector-search/tutorials/semantic-search/index.md new file mode 100644 index 0000000000..2761cc907d --- /dev/null +++ b/_vector-search/tutorials/semantic-search/index.md @@ -0,0 +1,59 @@ +--- +layout: default +title: Semantic search +parent: Tutorials +has_children: true +has_toc: false +nav_order: 50 +redirect_from: + - /vector-search/tutorials/semantic-search/ +semantic_search: + - heading: "Semantic search using the OpenAI embedding model" + link: "/vector-search/tutorials/semantic-search/semantic-search-openai/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: OpenAI embedding" + - "Deployment: Provider API" + - heading: "Semantic search using Cohere Embed" + link: "/vector-search/tutorials/semantic-search/semantic-search-cohere/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Cohere Embed" + - "Deployment: Provider API" + - heading: "Semantic search using Cohere Embed on Amazon Bedrock" + link: "/vector-search/tutorials/semantic-search/semantic-search-bedrock-cohere/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Cohere Embed" + - "Deployment: Amazon Bedrock" + - heading: Semantic search using Amazon Bedrock Titan + link: "/vector-search/tutorials/semantic-search/semantic-search-bedrock-titan/" + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Amazon Titan" + - "Deployment: Amazon Bedrock" + - heading: "Semantic search using Amazon Bedrock Titan in another account" + link: /vector-search/tutorials/semantic-search/semantic-search-bedrock-titan-other/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Amazon Titan" + - "Deployment: Amazon Bedrock (in a different account than your Amazon OpenSearch Service account)" + - heading: Semantic search using a model in Amazon SageMaker + link: /vector-search/tutorials/semantic-search/semantic-search-sagemaker/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Custom" + - "Deployment: Amazon SageMaker" + - heading: Semantic search using AWS CloudFormation and Amazon SageMaker + link: /vector-search/tutorials/semantic-search/semantic-search-cfn-sagemaker/ + list: + - "Platform: OpenSearch, Amazon OpenSearch Service" + - "Model: Custom" + - "Deployment: Amazon SageMaker + CloudFormation" +--- + +# Semantic search tutorials + +The following tutorials show you how to implement semantic search. + +{% include cards.html cards=page.semantic_search %} \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-cohere.md b/_vector-search/tutorials/semantic-search/semantic-search-bedrock-cohere.md similarity index 100% rename from _ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-cohere.md rename to _vector-search/tutorials/semantic-search/semantic-search-bedrock-cohere.md diff --git a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan-other.md b/_vector-search/tutorials/semantic-search/semantic-search-bedrock-titan-other.md similarity index 100% rename from _ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan-other.md rename to _vector-search/tutorials/semantic-search/semantic-search-bedrock-titan-other.md diff --git a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan.md b/_vector-search/tutorials/semantic-search/semantic-search-bedrock-titan.md similarity index 100% rename from _ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-titan.md rename to _vector-search/tutorials/semantic-search/semantic-search-bedrock-titan.md diff --git a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-cfn-sagemaker.md b/_vector-search/tutorials/semantic-search/semantic-search-cfn-sagemaker.md similarity index 97% rename from _ml-commons-plugin/tutorials/semantic-search/semantic-search-cfn-sagemaker.md rename to _vector-search/tutorials/semantic-search/semantic-search-cfn-sagemaker.md index 88458495e2..baa88882da 100644 --- a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-cfn-sagemaker.md +++ b/_vector-search/tutorials/semantic-search/semantic-search-cfn-sagemaker.md @@ -12,7 +12,7 @@ This tutorial shows you how to implement semantic search in [Amazon OpenSearch S If you are using self-managed OpenSearch instead of Amazon OpenSearch Service, create a connector to the Amazon SageMaker model using [the blueprint](https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/sagemaker_connector_blueprint.md). For more information about creating a connector, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). -The CloudFormation integration automates the steps in the [Semantic Search with SageMaker Embedding Model tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-sagemaker/). The CloudFormation template creates an IAM role and invokes an AWS Lambda function to set up an AI connector and model. +The CloudFormation integration automates the steps in the [Semantic Search with SageMaker Embedding Model tutorial]({{site.url}}{{site.baseurl}}/vector-search/tutorials/semantic-search/semantic-search-sagemaker/). The CloudFormation template creates an IAM role and invokes an AWS Lambda function to set up an AI connector and model. Replace the placeholders beginning with the prefix `your_` with your own values. {: .note} @@ -122,7 +122,7 @@ Note the domain Amazon Resource Name (ARN); you'll use it in the following steps ## Step 1: Map a backend role -The OpenSearch CloudFormation template uses a Lambda function to create an AI connector with an AWS Identity and Access Management (IAM) role. You must map the IAM role to `ml_full_access` to grant the required permissions. Follow [Step 2.2 of the Semantic Search with SageMaker Embedding Model tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-sagemaker/#step-22-map-a-backend-role) to map a backend role. +The OpenSearch CloudFormation template uses a Lambda function to create an AI connector with an AWS Identity and Access Management (IAM) role. You must map the IAM role to `ml_full_access` to grant the required permissions. Follow [Step 2.2 of the Semantic Search with SageMaker Embedding Model tutorial]({{site.url}}{{site.baseurl}}/vector-search/tutorials/semantic-search/semantic-search-sagemaker/#step-22-map-a-backend-role) to map a backend role. The IAM role is specified in the **Lambda Invoke OpenSearch ML Commons Role Name** field in the CloudFormation template. The default IAM role is `LambdaInvokeOpenSearchMLCommonsRole`, so you must map the `arn:aws:iam::your_aws_account_id:role/LambdaInvokeOpenSearchMLCommonsRole` backend role to `ml_full_access`. diff --git a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-cohere.md b/_vector-search/tutorials/semantic-search/semantic-search-cohere.md similarity index 99% rename from _ml-commons-plugin/tutorials/semantic-search/semantic-search-cohere.md rename to _vector-search/tutorials/semantic-search/semantic-search-cohere.md index cef161bec2..f23868ff04 100644 --- a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-cohere.md +++ b/_vector-search/tutorials/semantic-search/semantic-search-cohere.md @@ -15,7 +15,7 @@ If you are using self-managed OpenSearch instead of Amazon OpenSearch Service, c The easiest way to set up an embedding model in Amazon OpenSearch Service is by using [AWS CloudFormation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cfn-template.html). Alternatively, you can set up an embedding model using [the AIConnectorHelper notebook](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/tutorials/aws/AIConnectorHelper.ipynb). {: .tip} -The Cohere Embed model is also available on Amazon Bedrock. To use the model hosted on Amazon Bedrock, see [Semantic search using the Cohere Embed model on Amazon Bedrock]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/semantic-search/semantic-search-bedrock-cohere/). +The Cohere Embed model is also available on Amazon Bedrock. To use the model hosted on Amazon Bedrock, see [Semantic search using the Cohere Embed model on Amazon Bedrock]({{site.url}}{{site.baseurl}}/vector-search/tutorials/semantic-search/semantic-search-bedrock-cohere/). Replace the placeholders beginning with the prefix `your_` with your own values. {: .note} diff --git a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-openai.md b/_vector-search/tutorials/semantic-search/semantic-search-openai.md similarity index 100% rename from _ml-commons-plugin/tutorials/semantic-search/semantic-search-openai.md rename to _vector-search/tutorials/semantic-search/semantic-search-openai.md diff --git a/_ml-commons-plugin/tutorials/semantic-search/semantic-search-sagemaker.md b/_vector-search/tutorials/semantic-search/semantic-search-sagemaker.md similarity index 100% rename from _ml-commons-plugin/tutorials/semantic-search/semantic-search-sagemaker.md rename to _vector-search/tutorials/semantic-search/semantic-search-sagemaker.md diff --git a/_ml-commons-plugin/tutorials/vector-operations/generate-embeddings.md b/_vector-search/tutorials/vector-operations/generate-embeddings.md similarity index 100% rename from _ml-commons-plugin/tutorials/vector-operations/generate-embeddings.md rename to _vector-search/tutorials/vector-operations/generate-embeddings.md diff --git a/_vector-search/tutorials/vector-operations/index.md b/_vector-search/tutorials/vector-operations/index.md new file mode 100644 index 0000000000..0a131f82ae --- /dev/null +++ b/_vector-search/tutorials/vector-operations/index.md @@ -0,0 +1,29 @@ +--- +layout: default +title: Vector operations +parent: Tutorials +has_children: true +has_toc: false +nav_order: 10 +redirect_from: + - /vector-search/tutorials/vector-operations/ +vector_operations: + - heading: "Generating embeddings from arrays of objects" + list: + - "Platform: OpenSearch" + - "Model: Amazon Titan" + - "Deployment: Amazon Bedrock" + link: "/vector-search/tutorials/vector-operations/generate-embeddings/" + - heading: "Semantic search using byte-quantized vectors" + list: + - "Platform: OpenSearch" + - "Model: Cohere Embed" + - "Deployment: Provider API" + link: "/vector-search/tutorials/vector-operations/semantic-search-byte-vectors/" +--- + +# Vector operation tutorials + +The following tutorials show you how to implement vector operations. + +{% include cards.html cards=page.vector_operations %} \ No newline at end of file diff --git a/_ml-commons-plugin/tutorials/vector-operations/semantic-search-byte-vectors.md b/_vector-search/tutorials/vector-operations/semantic-search-byte-vectors.md similarity index 99% rename from _ml-commons-plugin/tutorials/vector-operations/semantic-search-byte-vectors.md rename to _vector-search/tutorials/vector-operations/semantic-search-byte-vectors.md index 16c08a538a..bd9e10870b 100644 --- a/_ml-commons-plugin/tutorials/vector-operations/semantic-search-byte-vectors.md +++ b/_vector-search/tutorials/vector-operations/semantic-search-byte-vectors.md @@ -10,7 +10,7 @@ redirect_from: # Semantic search using byte-quantized vectors -This tutorial shows you how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#byte-vectors). +This tutorial shows you how to build a semantic search using the [Cohere Embed model](https://docs.cohere.com/reference/embed) and byte-quantized vectors. For more information about using byte-quantized vectors, see [Byte vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/#byte-vectors). The Cohere Embed v3 model supports several `embedding_types`. For this tutorial, you'll use the `INT8` type to encode byte-quantized vectors. diff --git a/_vector-search/vector-search-techniques/approximate-knn.md b/_vector-search/vector-search-techniques/approximate-knn.md new file mode 100644 index 0000000000..96c375f9a3 --- /dev/null +++ b/_vector-search/vector-search-techniques/approximate-knn.md @@ -0,0 +1,255 @@ +--- +layout: default +title: Approximate k-NN search +nav_order: 15 +parent: Vector search techniques +has_children: false +has_math: true +redirect_from: + - /search-plugins/knn/approximate-knn/ +--- + +# Approximate k-NN search + +Standard k-nearest neighbors (k-NN) search methods compute similarity using a brute-force approach that measures the nearest distance between a query and a number of points, which produces exact results. This works well in many applications. However, in the case of extremely large datasets with high dimensionality, this creates a scaling problem that reduces the efficiency of the search. Approximate k-NN search methods can overcome this by employing tools that restructure indexes more efficiently and reduce the dimensionality of searchable vectors. Using this approach requires a sacrifice in accuracy but increases search processing speeds appreciably. + +The approximate k-NN search methods in OpenSearch use approximate nearest neighbor (ANN) algorithms from the [NMSLIB](https://github.com/nmslib/nmslib), [Faiss](https://github.com/facebookresearch/faiss), and [Lucene](https://lucene.apache.org/) libraries to power k-NN search. These search methods employ ANN to improve search latency for large datasets. Of the three search methods OpenSearch provides, this method offers the best search scalability for large datasets. This approach is the preferred method when a dataset reaches hundreds of thousands of vectors. + +For information about the algorithms OpenSearch supports, see [Methods and engines]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/). +{: .note} + +OpenSearch builds a native library index of the vectors for each `knn-vector` field/Lucene segment pair during indexing, which can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These native library indexes are loaded into native memory during search and managed by a cache. To learn more about preloading native library indexes into memory, see [Warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see which native library indexes are already loaded into memory using the [Stats API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats). + +Because the native library indexes are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied to the results produced by the ANN search. + +## Get started with approximate k-NN + +To use the approximate search functionality, you must first create a vector index with `index.knn` set to `true`. This setting tells OpenSearch to create native library indexes for the index. + +Next, you must add one or more fields of the `knn_vector` data type. The following example creates an index with two `knn_vector` fields using the `faiss` engine: + +```json +PUT my-knn-index-1 +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2, + "space_type": "l2", + "method": { + "name": "hnsw", + "engine": "faiss", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + }, + "my_vector2": { + "type": "knn_vector", + "dimension": 4, + "space_type": "innerproduct", + "method": { + "name": "hnsw", + "engine": "faiss", + "parameters": { + "ef_construction": 256, + "m": 48 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +In the preceding example, both `knn_vector` fields are configured using method definitions. Additionally, `knn_vector` fields can be configured using models. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). + +The `knn_vector` data type supports a vector of floats that can have a dimension count of up to 16,000 for the NMSLIB, Faiss, and Lucene engines, as set by the `dimension` mapping parameter. + +In OpenSearch, codecs handle the storage and retrieval of indexes. OpenSearch uses a custom codec to write vector data to native library indexes so that the underlying k-NN search library can read it. +{: .tip } + +After you create the index, you can add some data to it: + +```json +POST _bulk +{ "index": { "_index": "my-knn-index-1", "_id": "1" } } +{ "my_vector1": [1.5, 2.5], "price": 12.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "2" } } +{ "my_vector1": [2.5, 3.5], "price": 7.1 } +{ "index": { "_index": "my-knn-index-1", "_id": "3" } } +{ "my_vector1": [3.5, 4.5], "price": 12.9 } +{ "index": { "_index": "my-knn-index-1", "_id": "4" } } +{ "my_vector1": [5.5, 6.5], "price": 1.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "5" } } +{ "my_vector1": [4.5, 5.5], "price": 3.7 } +{ "index": { "_index": "my-knn-index-1", "_id": "6" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } +{ "index": { "_index": "my-knn-index-1", "_id": "7" } } +{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } +{ "index": { "_index": "my-knn-index-1", "_id": "8" } } +{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } +{ "index": { "_index": "my-knn-index-1", "_id": "9" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } +``` +{% include copy-curl.html %} + +Then you can run an ANN search on the data using the `knn` query type: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +## The number of returned results + +In the preceding query, `k` represents the number of neighbors returned by the search of each graph. You must also include the `size` parameter, indicating the final number of results that you want the query to return. + +For the NMSLIB and Faiss engines, `k` represents the maximum number of documents returned for all segments of a shard. For the Lucene engine, `k` represents the number of documents returned for a shard. The maximum value of `k` is 10,000. + +For any engine, each shard returns `size` results to the coordinator node. Thus, the total number of results that the coordinator node receives is `size * number of shards`. After the coordinator node consolidates the results received from all nodes, the query returns the top `size` results. + +The following table provides examples of the number of results returned by various engines in several scenarios. For these examples, assume that the number of documents contained in the segments and shards is sufficient to return the number of results specified in the table. + +`size` | `k` | Number of primary shards | Number of segments per shard | Number of returned results, Faiss/NMSLIB | Number of returned results, Lucene +:--- | :--- | :--- | :--- | :--- | :--- +10 | 1 | 1 | 4 | 4 | 1 +10 | 10 | 1 | 4 | 10 | 10 +10 | 1 | 2 | 4 | 8 | 2 + +The number of results returned by Faiss/NMSLIB differs from the number of results returned by Lucene only when `k` is smaller than `size`. If `k` and `size` are equal, all engines return the same number of results. + +Starting in OpenSearch 2.14, you can use `k`, `min_score`, or `max_distance` for [radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). + +## Building a vector index from a model + +For some of the algorithms that OpenSearch supports, the native library index needs to be trained before it can be used. It would be expensive to train every newly created segment, so, instead, OpenSearch features the concept of a *model* that initializes the native library index during segment creation. You can create a model by calling the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model) and passing in the source of the training data and the method definition of the model. Once training is complete, the model is serialized to a k-NN model system index. Then, during indexing, the model is pulled from that index to initialize the segments. + +To train a model, you first need an OpenSearch index containing training data. Training data can come from any `knn_vector` field that has a dimension matching the dimension of the model you want to create. Training data can be the same as the data you plan to index or come from a separate dataset. To create a training index, send the following request: + +```json +PUT /train-index +{ + "settings": { + "number_of_shards": 3, + "number_of_replicas": 0 + }, + "mappings": { + "properties": { + "train-field": { + "type": "knn_vector", + "dimension": 4 + } + } + } +} +``` +{% include copy-curl.html %} + +Notice that `index.knn` is not set in the index settings. This ensures that you do not create native library indexes for this index. + +You can now add some data to the index: + +```json +POST _bulk +{ "index": { "_index": "train-index", "_id": "1" } } +{ "train-field": [1.5, 5.5, 4.5, 6.4]} +{ "index": { "_index": "train-index", "_id": "2" } } +{ "train-field": [2.5, 3.5, 5.6, 6.7]} +{ "index": { "_index": "train-index", "_id": "3" } } +{ "train-field": [4.5, 5.5, 6.7, 3.7]} +{ "index": { "_index": "train-index", "_id": "4" } } +{ "train-field": [1.5, 5.5, 4.5, 6.4]} +``` +{% include copy-curl.html %} + +After completing indexing into the training index, you can call the Train API: + +```json +POST /_plugins/_knn/models/my-model/_train +{ + "training_index": "train-index", + "training_field": "train-field", + "dimension": 4, + "description": "My model description", + "space_type": "l2", + "method": { + "name": "ivf", + "engine": "faiss", + "parameters": { + "nlist": 4, + "nprobes": 2 + } + } +} +``` +{% include copy-curl.html %} + +The Train API returns as soon as the training job is started. To check the job status, use the Get Model API: + +```json +GET /_plugins/_knn/models/my-model?filter_path=state&pretty +{ + "state": "training" +} +``` +{% include copy-curl.html %} + +Once the model enters the `created` state, you can create an index that will use this model to initialize its native library indexes: + +```json +PUT /target-index +{ + "settings": { + "number_of_shards": 3, + "number_of_replicas": 1, + "index.knn": true + }, + "mappings": { + "properties": { + "target-field": { + "type": "knn_vector", + "model_id": "my-model" + } + } + } +} +``` +{% include copy-curl.html %} + +Lastly, you can add the documents you want to be searched to the index: + +```json +POST _bulk +{ "index": { "_index": "target-index", "_id": "1" } } +{ "target-field": [1.5, 5.5, 4.5, 6.4]} +{ "index": { "_index": "target-index", "_id": "2" } } +{ "target-field": [2.5, 3.5, 5.6, 6.7]} +{ "index": { "_index": "target-index", "_id": "3" } } +{ "target-field": [4.5, 5.5, 6.7, 3.7]} +{ "index": { "_index": "target-index", "_id": "4" } } +{ "target-field": [1.5, 5.5, 4.5, 6.4]} +``` +{% include copy-curl.html %} + +After data is ingested, it can be searched in the same way as any other `knn_vector` field. diff --git a/_vector-search/vector-search-techniques/index.md b/_vector-search/vector-search-techniques/index.md new file mode 100644 index 0000000000..8a3f950069 --- /dev/null +++ b/_vector-search/vector-search-techniques/index.md @@ -0,0 +1,38 @@ +--- +layout: default +title: Vector search techniques +nav_order: 15 +has_children: true +has_toc: false +redirect_from: + - /search-plugins/knn/ + - /search-plugins/knn/index/ + - /vector-search/vector-search-techniques/ +--- + +# Vector search techniques + +OpenSearch implements vector search as *k-nearest neighbors*, or *k-NN*, search. k-NN search finds the k neighbors closest to a query point across an index of vectors. To determine the neighbors, you can specify the space (the distance function) you want to use to measure the distance between points. + +OpenSearch supports three different methods for obtaining the k-nearest neighbors from an index of vectors: + +- [Approximate search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) (approximate k-NN, or ANN): Returns approximate nearest neighbors to the query vector. Usually, approximate search algorithms sacrifice indexing speed and search accuracy in exchange for performance benefits such as lower latency, smaller memory footprints, and more scalable search. For most use cases, approximate search is the best option. + +- Exact search: A brute-force, exact k-NN search of vector fields. OpenSearch supports the following types of exact search: + - [Exact search with a scoring script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/): Using a scoring script, you can apply a filter to an index before executing the nearest neighbor search. + - [Painless extensions]({{site.url}}{{site.baseurl}}/search-plugins/knn/painless-functions/): Adds the distance functions as Painless extensions that you can use in more complex combinations. You can use this method to perform a brute-force, exact vector search of an index, which also supports pre-filtering. + + +In general, you should choose the ANN method for larger datasets because it scales significantly better. For smaller datasets, where you may want to apply a filter, you should choose the custom scoring approach. If you have a more complex use case in which you need to use a distance function as part of the scoring method, you should use the Painless scripting approach. + +## Approximate search + +OpenSearch supports multiple backend algorithms (_methods_) and libraries for implementing these algorithms (_engines_). It automatically selects the optimal configuration based on the chosen mode and available memory. For more information, see [Methods and engines]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/). + +## Using sparse vectors + +_Neural sparse search_ offers an efficient alternative to dense vector search by using sparse embedding models and inverted indexes, providing performance similar to BM25. Unlike dense vector methods that require significant memory and CPU resources, sparse search creates a list of token-weight pairs and stores them in a rank features index. This approach combines the efficiency of traditional search with the semantic understanding of neural networks. OpenSearch supports both automatic embedding generation through ingest pipelines and direct sparse vector ingestion. For more information, see [Neural sparse search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/neural-sparse-search/). + +## Combining multiple search techniques + +_Hybrid search_ enhances search relevance by combining multiple search techniques in OpenSearch. It integrates traditional keyword search with vector-based semantic search. Through a configurable search pipeline, hybrid search normalizes and combines scores from different search methods to provide unified, relevant results. This approach is particularly effective for complex queries where both semantic understanding and exact matching are important. The search pipeline can be further customized with post-filtering operations and aggregations to meet specific search requirements. For more information, see [Hybrid search]({{site.url}}{{site.baseurl}}/vector-search/ai-search/hybrid-search/). diff --git a/_search-plugins/knn/knn-score-script.md b/_vector-search/vector-search-techniques/knn-score-script.md similarity index 58% rename from _search-plugins/knn/knn-score-script.md rename to _vector-search/vector-search-techniques/knn-score-script.md index 45e7d1a67c..da5b159baa 100644 --- a/_search-plugins/knn/knn-score-script.md +++ b/_vector-search/vector-search-techniques/knn-score-script.md @@ -1,23 +1,25 @@ --- layout: default -title: Exact k-NN with scoring script -nav_order: 10 -parent: k-NN search -has_children: false +title: Exact k-NN search with a scoring script +nav_order: 20 +parent: Vector search techniques +has_children: true has_math: true +redirect_from: + - /search-plugins/knn/knn-score-script/ --- -# Exact k-NN with scoring script +# Exact k-NN search with a scoring script -The k-NN plugin implements the OpenSearch score script plugin that you can use to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. +You can use exact k-nearest neighbors (k-NN) search with a scoring script to find the exact k-nearest neighbors to a given query point. Using the k-NN scoring script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search use cases, where the index body may vary based on other conditions. -Because the score script approach executes a brute force search, it doesn't scale as well as the [approximate approach]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn). In some cases, it might be better to think about refactoring your workflow or index structure to use the approximate approach instead of the score script approach. +Because the scoring script approach executes a brute force search, it doesn't scale as efficiently as the [approximate approach]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/). In some cases, it might be better to consider refactoring your workflow or index structure to use the approximate approach instead of the scoring script approach. -## Getting started with the score script for vectors +## Getting started with the scoring script for vectors -Similar to approximate nearest neighbor search, in order to use the score script on a body of vectors, you must first create an index with one or more `knn_vector` fields. +Similarly to approximate nearest neighbor (ANN) search, in order to use the scoring script on a body of vectors, you must first create an index with one or more `knn_vector` fields. -If you intend to just use the score script approach (and not the approximate approach) you can set `index.knn` to `false` and not set `index.knn.space_type`. You can choose the space type during search. See [spaces](#spaces) for the spaces the k-NN score script suppports. +If you intend to only use the scoring script approach (and not the approximate approach), you can set `index.knn` to `false` and not set `index.knn.space_type`. You can choose the space type during search. For the spaces that the k-NN scoring script supports, see [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/). This example creates an index with two `knn_vector` fields: @@ -40,7 +42,7 @@ PUT my-knn-index-1 ``` {% include copy-curl.html %} -If you *only* want to use the score script, you can omit `"index.knn": true`. The benefit of this approach is faster indexing speed and lower memory usage, but you lose the ability to perform standard k-NN queries on the index. +If you want to *only* use the scoring script, you can omit `"index.knn": true`. This approach leads to faster indexing speed and lower memory usage, but you lose the ability to run standard k-NN queries on the index. {: .tip} After you create the index, you can add some data to it: @@ -68,7 +70,8 @@ POST _bulk ``` {% include copy-curl.html %} -Finally, you can execute an exact nearest neighbor search on the data using the `knn` script: +Finally, you can run an exact nearest neighbor search on the data using the `knn` script: + ```json GET my-knn-index-1/_search { @@ -102,11 +105,11 @@ All parameters are required. - `field` is the field that contains your vector data. - `query_value` is the point you want to find the nearest neighbors for. For the Euclidean and cosine similarity spaces, the value must be an array of floats that matches the dimension set in the field's mapping. For Hamming bit distance, this value can be either of type signed long or a base64-encoded string (for the long and binary field types, respectively). -- `space_type` corresponds to the distance function. See the [spaces section](#spaces). +- `space_type` corresponds to the distance function. For more information, see [Spaces]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-spaces/). -The [post filter example in the approximate approach]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn#using-approximate-k-nn-with-filters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents over which to execute the k-nearest neighbor search. +The [post filter example in the approximate approach]({{site.url}}{{site.baseurl}}/vector-search/filter-search-knn/) shows a search that returns fewer than `k` results. If you want to avoid this, the scoring script method lets you essentially invert the order of events. In other words, you can filter the set of documents on which to execute the k-NN search. -This example shows a pre-filter approach to k-NN search with the score script approach. First, create the index: +This example shows a pre-filter approach to k-NN search with the scoring script approach. First, create the index: ```json PUT my-knn-index-2 @@ -177,8 +180,9 @@ GET my-knn-index-2/_search ``` {% include copy-curl.html %} -## Getting started with the score script for binary data -The k-NN score script also allows you to run k-NN search on your binary data with the Hamming distance space. +## Getting started with the scoring script for binary data + +The k-NN scoring script also allows you to run k-NN search on your binary data with the Hamming distance space. In order to use Hamming distance, the field of interest must have either a `binary` or `long` field type. If you're using `binary` type, the data must be a base64-encoded string. This example shows how to use the Hamming distance space with a `binary` field type: @@ -284,23 +288,3 @@ GET my-long-index/_search ``` {% include copy-curl.html %} -## Spaces - -A _space_ corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a higher score equates to a better result. The following table illustrates how OpenSearch converts spaces to scores. - -| Space type | Distance function ($$d$$ ) | OpenSearch score | -| :--- | :--- | :--- | -| `l1` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \lvert x_i - y_i \rvert $$ | $$ score = {1 \over {1 + d} } $$ | -| `l2` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 $$ | $$ score = {1 \over 1 + d } $$ | -| `linf` | $$ d(\mathbf{x}, \mathbf{y}) = max(\lvert x_i - y_i \rvert) $$ | $$ score = {1 \over 1 + d } $$ | -| `cosinesimil` | $$ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} \cdot \mathbf{y} \over \lVert \mathbf{x}\rVert \cdot \lVert \mathbf{y}\rVert}$$$$ = 1 - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} \cdot \sqrt{\sum_{i=1}^n y_i^2}}$$,
where $$\lVert \mathbf{x}\rVert$$ and $$\lVert \mathbf{y}\rVert$$ represent the norms of vectors $$\mathbf{x}$$ and $$\mathbf{y}$$, respectively. | $$ score = {2 - d \over 2 } $$ | -| `innerproduct` (supported for Lucene in OpenSearch version 2.13 and later) | $$ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i $$ | $$ \text{If} d \ge 0, score = {1 \over 1 + d }$$
$$\text{If} d < 0, score = −d + 1$$ | -| `hammingbit` (supported for binary and long vectors)

`hamming` (supported for binary vectors in OpenSearch version 2.16 and later) | $$ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})$$ | $$ score = {1 \over 1 + d } $$ | - -Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score. - -With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ... ]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. -{: .note } - -The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-vectors). -{: .note} diff --git a/_vector-search/vector-search-techniques/painless-functions.md b/_vector-search/vector-search-techniques/painless-functions.md new file mode 100644 index 0000000000..4f106e378a --- /dev/null +++ b/_vector-search/vector-search-techniques/painless-functions.md @@ -0,0 +1,80 @@ +--- +layout: default +title: Painless extensions +nav_order: 25 +parent: Exact k-NN search with a scoring script +grand_parent: Vector search techniques +has_children: false +has_math: true +redirect_from: + - /search-plugins/knn/painless-functions/ +--- + +# Painless scripting extensions + +With Painless scripting extensions, you can use k-nearest neighbors (k-NN) distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure that its scripts are secure. OpenSearch adds Painless scripting extensions to a few of the distance functions used in [k-NN scoring script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/), so you can use them to customize your k-NN workload. + +## Get started with k-NN Painless scripting functions + +To use k-NN Painless scripting functions, first create an index with `knn_vector` fields, as described in [Getting started with the scoring script for vectors]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script#getting-started-with-the-scoring-script-for-vectors). Once you have created the index and ingested some data, you can use Painless extensions: + +```json +GET my-knn-index-2/_search +{ + "size": 2, + "query": { + "script_score": { + "query": { + "bool": { + "filter": { + "term": { + "color": "BLUE" + } + } + } + }, + "script": { + "source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])", + "params": { + "field": "my_vector", + "query_value": [9.9, 9.9] + } + } + } + } +} +``` +{% include copy-curl.html %} + +`field` needs to map to a `knn_vector` field, and `query_value` must be a floating-point array with the same dimension as `field`. + +## Function types + +The following table describes the Painless functions OpenSearch provides. + +Function name | Function signature | Description +:--- | :--- +`l2Squared` | `float l2Squared (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. A shorter distance indicates a more relevant document, so this example inverts the return value of the `l2Squared` function. If the document vector matches the query vector, the result is `0`, so this example also adds `1` to the distance to avoid divide-by-zero errors. +`l1Norm` | `float l1Norm (float[] queryVector, doc['vector field'])` | This function calculates the L1 norm distance (Manhattan distance) between a given query vector and document vectors. +`cosineSimilarity` | `float cosineSimilarity (float[] queryVector, doc['vector field'])` | Cosine similarity is an inner product of the query vector and document vector normalized to both have a length of `1`. If the magnitude of the query vector doesn't change throughout the query, you can pass the magnitude of the query vector to improve performance instead of repeatedly calculating the magnitude for every filtered document:
`float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)`
In general, the range of cosine similarity is [-1, 1]. However, in the case of information retrieval, the cosine similarity of two documents ranges from `0` to `1` because the `tf-idf` statistic can't be negative. Therefore, OpenSearch adds `1.0` in order to always yield a positive cosine similarity score. +`hamming` | `float hamming (float[] queryVector, doc['vector field'])` | This function calculates the Hamming distance between a given query vector and document vectors. The Hamming distance is the number of positions at which the corresponding elements are different. A shorter distance indicates a more relevant document, so this example inverts the return value of the Hamming distance. + +The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized#binary-vectors). +{: .note} + +## Constraints + +1. If a document's `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`. + +2. If a vector field doesn't have a value, the function throws an `IllegalStateException`. + + You can avoid this by first checking whether a document contains a value in its field: + + ``` + "source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))", + ``` + + Because scores can only be positive, this script ranks documents with vector fields higher than those without vector fields. + +When using cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. +{: .note } diff --git a/images/vector-search/auto-vector-ingest.png b/images/vector-search/auto-vector-ingest.png new file mode 100644 index 0000000000..07550a3a95 Binary files /dev/null and b/images/vector-search/auto-vector-ingest.png differ diff --git a/images/vector-search/auto-vector-search.png b/images/vector-search/auto-vector-search.png new file mode 100644 index 0000000000..e80f37b308 Binary files /dev/null and b/images/vector-search/auto-vector-search.png differ diff --git a/images/vector-search/embeddings.png b/images/vector-search/embeddings.png new file mode 100644 index 0000000000..d627de1d0c Binary files /dev/null and b/images/vector-search/embeddings.png differ diff --git a/images/vector-search/raw-vector-ingest.png b/images/vector-search/raw-vector-ingest.png new file mode 100644 index 0000000000..a1c0951bcc Binary files /dev/null and b/images/vector-search/raw-vector-ingest.png differ diff --git a/images/vector-search/raw-vector-search.png b/images/vector-search/raw-vector-search.png new file mode 100644 index 0000000000..873eb2f012 Binary files /dev/null and b/images/vector-search/raw-vector-search.png differ diff --git a/images/vector-search/vector-similarity.jpg b/images/vector-search/vector-similarity.jpg new file mode 100644 index 0000000000..5dcd8a8e5b Binary files /dev/null and b/images/vector-search/vector-similarity.jpg differ diff --git a/index.md b/index.md index 6fac0021db..ed4d943d9f 100755 --- a/index.md +++ b/index.md @@ -9,4 +9,4 @@ permalink: / {% include banner.html %} -{% include cards.html %} \ No newline at end of file +{% include home_cards.html %} \ No newline at end of file