[RW Separation] Restrict Search Replica Allocation to Search-Dedicated Nodes #17422

vinaykpud · 2025-02-21T20:32:47Z

Describe the bug

When an index is created or when search replicas are added for an index, search replicas should be allocated only to nodes that have the search role assigned. If there are no such nodes available, the search replicas should remain in an unassigned state. If a node with the search role becomes available later, the search replicas should be assigned accordingly.

Related component

Search:Performance

To Reproduce

Create a cluster with 3 nodes.
Create an index with the following configuration:
- 1 Primary shard (1P)
- 1 Replica shard (1R)
- 1 Search Replica shard (1SR)
Observe how the shards are allocated across the cluster.

Currently, search replicas may get assigned to nodes without the search role, leading to incorrect shard placement.

Expected behavior

The primary and replica shards should be assigned to available nodes.
The search replica should remain in an unassigned state if no node with the search role exists.
If a node with the search role is added to the cluster, the search replica should be assigned to it.

Assigning a search node role for a node:
Ref: #15445

curl -X PUT "http://localhost:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d '{
  "persistent": {
    "cluster.routing.allocation.search.replica.dedicated.include._name": "node-1"
  }
}'

Additional Details

No response

The text was updated successfully, but these errors were encountered:

mch2 · 2025-02-26T17:27:08Z

@vinaykpud Search replicas rely on the allocation filter for hardware designation, not the "Search node role", are you seeing the bug when applying the filter?

gbbafna · 2025-03-03T07:32:09Z

@mch2 , @vinaykpud : Adding allocation filter explicitly adds too much overhead . Is there a way to simplify the experience ? Also how would allocation/balancing work out for search replicas ?

mch2 · 2025-03-03T21:16:25Z

Adding allocation filter explicitly adds too much overhead

@gbbafna Can you elaborate here? Are you referring to potential overhead on match with DiscoveryNodeFilter? If thats significant perhaps we limit to a concrete node attr and look up on the DiscoveryNode like isRemoteStoreNode and yank the filter setting.

The idea was to use SearchReplicaAllocationDecider a decider we introduced a while back that operates similarly to FilterAllocationDecider. This would work with any node attribute and not be reliant on a node role, but I'm ok with narrowing this a bit to a specific node attr...

In terms of general allocation/balancing for SRs, we would apply existing deciders, zonal awareness & replica auto expansion etc only within the nodes marked with the attribute. Non SRs would apply the same within their set of nodes.

vinaykpud · 2025-03-03T22:36:35Z

@vinaykpud Search replicas rely on the allocation filter for hardware designation, not the "Search node role", are you seeing the bug when applying the filter?

This is to prevent search replica allocation in the absence of the include filter.

gbbafna · 2025-03-04T04:57:51Z

@gbbafna Can you elaborate here? Are you referring to potential overhead on match with DiscoveryNodeFilter? If thats significant perhaps we limit to a concrete node attr and look up on the DiscoveryNode like isRemoteStoreNode and yank the filter setting.

I was referring to overhead on cluster operator to apply this settings . They will need to add either an attribute to all such nodes by defining this in their yml file and use this attribute in their cluster settings. The other option would be to specify nodes , which will be inconvenient .

vinaykpud · 2025-03-07T01:31:23Z

@mch2 @Bukhtawar @gbbafna

I’ve responded to the concern here: #17457 (comment)

However, it seems we need to revisit our approach of using the inclusion filter to define a fleet of search nodes based on custom node attributes. While this approach allows users to select search nodes dynamically using any attribute, the key question is: How much practical value does this flexibility provide?

Concerns:

Concern 1: Complexity in Setting Up Searcher Fleet

The current approach requires a two-step process for users to configure a searcher fleet:

Define a custom attribute for the nodes in the YML file.
Set cluster.routing.allocation.search.replica.dedicated.include in the YML file or update it via the Cluster Settings API.

This additional complexity may not provide significant benefits compared to a simpler, more structured approach.

Concern 2: Impact on Awareness Allocation & Auto-Expand

Using a dynamic inclusion filter makes it difficult for Awareness Allocation and Auto-Expand implementations to identify and list search nodes. Since search nodes are determined based on filters, the only way to list them is by querying searchReplicaIncludeFilters in the SearchReplicaAllocationDecider

Relying on this indirect mechanism to determine search-dedicated nodes doesn’t seem ideal.

Proposed Solution: Predefined Node Attribute

A better approach would be to introduce a predefined custom attribute, such as:

node.attr.searchonly: "true"

Users can set this attribute on all nodes they want to designate as search-dedicated. Then, SearchReplicaAllocationDecider can use this attribute to:

Determine whether a node is eligible for search replica allocation.
Restrict non-search replicas from being assigned to these nodes.

Additionally, we can introduce a method in DiscoveryNode like:

boolean isSearchDedicatedNode()

This would provide a direct and reliable way to determine if a node is search-dedicated.

However, we should be cautious about potential confusion since we already have boolean isSearchNode(), which is based on node roles.

Also while doing this we dont want to add confusion since we have a similar method which is based on the node role.

Does this predefined node attribute approach look good? If you have any suggestions or alternative ideas, please let me know.

gbbafna · 2025-03-07T12:01:42Z

@vinaykpud : Thanks for putting up this detailed proposal. The approach looks good and simplifies the experience for our users.

Bukhtawar · 2025-03-07T17:50:01Z

Any thoughts on introducing node roles?

vinaykpud · 2025-03-07T18:00:30Z

We already have a Search Node role which was added for Searchable Snapshots, adding another role for this may add some confusion. What advantage we get if its Node Role vs attribute?

mch2 · 2025-03-07T18:33:33Z

Role or or attr would both work, we also have precedent of using role here with 'search' and the TargetPoolAllocationDecider.

Though a concern I have with role as Vinay mentioned is introducing a new role along side the existing "search" which is conceptually confusing to me. I think we're better off repurposing "search" to either hold warm or hot or both shard types.

Edit - @andrross corrected me here - we can mix search/hot already today with ['data','search'] as the roles - We would need to ensure we aren't reserving heap for file cache and set the cache size to 0 in settings.

This is what led me to the filter approach initially, you could use any data node attr or a new one and set it once, but I understand the drawbacks of the additional config.

mch2 · 2025-03-07T19:34:22Z

If a node only includes the existing search role we pre-allocate cache size to 80%. If the node also includes another role 'data' it sets the cache size to 0, but then blows up on startup requiring a nonzero value on the setting:

    private void initializeFileCache(Settings settings, CircuitBreaker circuitBreaker) throws IOException {
        if (DiscoveryNode.isSearchNode(settings) == false) {
            return;
        }

        String capacityRaw = NODE_SEARCH_CACHE_SIZE_SETTING.get(settings);
        logger.info("cache size [{}]", capacityRaw);
        if (capacityRaw.equals(ZERO)) {
            throw new SettingsException(
                "Unable to initialize the "
                    + DiscoveryNodeRole.SEARCH_ROLE.roleName()
                    + "-"
                    + DiscoveryNodeRole.DATA_ROLE.roleName()
                    + " node: Missing value for configuration "
                    + NODE_SEARCH_CACHE_SIZE_SETTING.getKey()
            );
        }

So to reuse the role we would need to allow an initial file cache size of 0 and for separation we would need to explicitly set the size to 0, change the default to 0 and require it set for searchable snapshots, or require both roles (ew). The cache size is not a dynamic setting either :/

I am thinking changing the default to 0 and requiring a nonzero value in order to hold ss shards makes the most sense here but that is a breaking change - @bugmakerrrrrr @andrross curious what you two think here as I've seen you in that code recently.

mch2 · 2025-03-07T20:22:41Z

I think we need to go with a new attr here for a separate reason - we would need exclusivity among roles. We are not supporting search only next to writeable shards on a single node. This brings issues around zonal awareness and auto expansion etc so we'd like to support only the extreme separation case at this point.

andrross · 2025-03-07T23:28:18Z

I think we need to go with a new attr here for a separate reason - we would need exclusivity among roles. We are not supporting search only next to writeable shards on a single node.

@mch2 I know 3.0 is coming up pretty quickly, but would it make any sense to make a breaking change and flip things around here? As in, change searchable snapshots/file cache behavior to work with a node attribute instead of the search role, then use the search role for this use case (which at least on the surface seems like it would make more sense).

gbbafna · 2025-03-10T05:46:01Z

@andrross , @mch2 : Just adding more context , we are considering adding a warm role for warm indices, which will be kind of exclusive role to begin with (not co exist with data ) . Separate role will give us clear code for allocation and rebalancing on lines of RemoteShardsBalancer .

mch2 · 2025-03-10T17:21:39Z

@andrross Yeah I'm ok with making that swap. @gbbafna Could todays searchable snapshots simply be renamed to warm and hold both writeable warm / warm and ss based on add'l index level config?

vinaykpud · 2025-03-10T22:03:23Z

@mch2 @gbbafna @andrross @Bukhtawar Based on the discussions so far, I am going to rename existing "Search role" to "Warm role". Then introduce/repurpose "Search role" for nodes to host Search Replicas.

vinaykpud · 2025-03-11T00:37:53Z

@mch2 Here is my commit for renaming: https://github.com/opensearch-project/OpenSearch/compare/main...vinaykpud:OpenSearch:rw/rename-role?expand=1, I will create a PR based on this.

Also in this PR: #17457, I will introduce the Search Role for a node.

mch2 · 2025-03-11T18:40:50Z

@vinaykpud can you pls flag this as well for documentation - we need to make this very clear & discoverable so ppl upgrading with existing 'search' nodes know to update the role.

vinaykpud · 2025-03-11T20:02:04Z

Added issue for documentation: opensearch-project/documentation-website#9392

gbbafna · 2025-03-12T06:33:36Z

@vinaykpud : Can we verify if we upgrade the cluster from 2.x to 3.0 with this change, searchable snapshots continue to work ? There might be few nuances here like changing the cluster-manager first .

vinaykpud added bug Something isn't working untriaged labels Feb 21, 2025

github-actions bot added the Search:Performance label Feb 21, 2025

github-project-automation bot added this to Search Project Board Feb 21, 2025

github-project-automation bot moved this to 🆕 New in Search Project Board Feb 21, 2025

mch2 removed the untriaged label Feb 26, 2025

vinaykpud mentioned this issue Mar 11, 2025

Renaming the node role search to warm #17573

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RW Separation] Restrict Search Replica Allocation to Search-Dedicated Nodes #17422

[RW Separation] Restrict Search Replica Allocation to Search-Dedicated Nodes #17422

vinaykpud commented Feb 21, 2025

mch2 commented Feb 26, 2025

gbbafna commented Mar 3, 2025

mch2 commented Mar 3, 2025 •

edited

Loading

vinaykpud commented Mar 3, 2025 •

edited

Loading

gbbafna commented Mar 4, 2025

vinaykpud commented Mar 7, 2025 •

edited

Loading

gbbafna commented Mar 7, 2025

Bukhtawar commented Mar 7, 2025

vinaykpud commented Mar 7, 2025

mch2 commented Mar 7, 2025 •

edited

Loading

mch2 commented Mar 7, 2025 •

edited

Loading

mch2 commented Mar 7, 2025

andrross commented Mar 7, 2025

gbbafna commented Mar 10, 2025

mch2 commented Mar 10, 2025 •

edited

Loading

vinaykpud commented Mar 10, 2025 •

edited

Loading

vinaykpud commented Mar 11, 2025 •

edited

Loading

mch2 commented Mar 11, 2025

vinaykpud commented Mar 11, 2025

gbbafna commented Mar 12, 2025

[RW Separation] Restrict Search Replica Allocation to Search-Dedicated Nodes #17422

[RW Separation] Restrict Search Replica Allocation to Search-Dedicated Nodes #17422

Comments

vinaykpud commented Feb 21, 2025

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

mch2 commented Feb 26, 2025

gbbafna commented Mar 3, 2025

mch2 commented Mar 3, 2025 • edited Loading

vinaykpud commented Mar 3, 2025 • edited Loading

gbbafna commented Mar 4, 2025

vinaykpud commented Mar 7, 2025 • edited Loading

Concerns:

Concern 1: Complexity in Setting Up Searcher Fleet

Concern 2: Impact on Awareness Allocation & Auto-Expand

Proposed Solution: Predefined Node Attribute

gbbafna commented Mar 7, 2025

Bukhtawar commented Mar 7, 2025

vinaykpud commented Mar 7, 2025

mch2 commented Mar 7, 2025 • edited Loading

mch2 commented Mar 7, 2025 • edited Loading

mch2 commented Mar 7, 2025

andrross commented Mar 7, 2025

gbbafna commented Mar 10, 2025

mch2 commented Mar 10, 2025 • edited Loading

vinaykpud commented Mar 10, 2025 • edited Loading

vinaykpud commented Mar 11, 2025 • edited Loading

mch2 commented Mar 11, 2025

vinaykpud commented Mar 11, 2025

gbbafna commented Mar 12, 2025

mch2 commented Mar 3, 2025 •

edited

Loading

vinaykpud commented Mar 3, 2025 •

edited

Loading

vinaykpud commented Mar 7, 2025 •

edited

Loading

mch2 commented Mar 7, 2025 •

edited

Loading

mch2 commented Mar 7, 2025 •

edited

Loading

mch2 commented Mar 10, 2025 •

edited

Loading

vinaykpud commented Mar 10, 2025 •

edited

Loading

vinaykpud commented Mar 11, 2025 •

edited

Loading