Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RW Separation] Restrict Search Replica Allocation to Search-Dedicated Nodes #17422

Open
vinaykpud opened this issue Feb 21, 2025 · 20 comments · May be fixed by #17457
Open

[RW Separation] Restrict Search Replica Allocation to Search-Dedicated Nodes #17422

vinaykpud opened this issue Feb 21, 2025 · 20 comments · May be fixed by #17457
Labels
bug Something isn't working Search:Performance

Comments

@vinaykpud
Copy link
Contributor

Describe the bug

When an index is created or when search replicas are added for an index, search replicas should be allocated only to nodes that have the search role assigned. If there are no such nodes available, the search replicas should remain in an unassigned state. If a node with the search role becomes available later, the search replicas should be assigned accordingly.

Related component

Search:Performance

To Reproduce

  1. Create a cluster with 3 nodes.
  2. Create an index with the following configuration:
    • 1 Primary shard (1P)
    • 1 Replica shard (1R)
    • 1 Search Replica shard (1SR)
  3. Observe how the shards are allocated across the cluster.
  • Currently, search replicas may get assigned to nodes without the search role, leading to incorrect shard placement.

Expected behavior

  • The primary and replica shards should be assigned to available nodes.
  • The search replica should remain in an unassigned state if no node with the search role exists.
  • If a node with the search role is added to the cluster, the search replica should be assigned to it.

Assigning a search node role for a node:
Ref: #15445

curl -X PUT "http://localhost:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d '{
  "persistent": {
    "cluster.routing.allocation.search.replica.dedicated.include._name": "node-1"
  }
}'

Additional Details

No response

@mch2
Copy link
Member

mch2 commented Feb 26, 2025

@vinaykpud Search replicas rely on the allocation filter for hardware designation, not the "Search node role", are you seeing the bug when applying the filter?

@gbbafna
Copy link
Collaborator

gbbafna commented Mar 3, 2025

@mch2 , @vinaykpud : Adding allocation filter explicitly adds too much overhead . Is there a way to simplify the experience ? Also how would allocation/balancing work out for search replicas ?

@mch2
Copy link
Member

mch2 commented Mar 3, 2025

Adding allocation filter explicitly adds too much overhead

@gbbafna Can you elaborate here? Are you referring to potential overhead on match with DiscoveryNodeFilter? If thats significant perhaps we limit to a concrete node attr and look up on the DiscoveryNode like isRemoteStoreNode and yank the filter setting.

The idea was to use SearchReplicaAllocationDecider a decider we introduced a while back that operates similarly to FilterAllocationDecider. This would work with any node attribute and not be reliant on a node role, but I'm ok with narrowing this a bit to a specific node attr...

In terms of general allocation/balancing for SRs, we would apply existing deciders, zonal awareness & replica auto expansion etc only within the nodes marked with the attribute. Non SRs would apply the same within their set of nodes.

@vinaykpud
Copy link
Contributor Author

vinaykpud commented Mar 3, 2025

@vinaykpud Search replicas rely on the allocation filter for hardware designation, not the "Search node role", are you seeing the bug when applying the filter?

This is to prevent search replica allocation in the absence of the include filter.

@gbbafna
Copy link
Collaborator

gbbafna commented Mar 4, 2025

@gbbafna Can you elaborate here? Are you referring to potential overhead on match with DiscoveryNodeFilter? If thats significant perhaps we limit to a concrete node attr and look up on the DiscoveryNode like isRemoteStoreNode and yank the filter setting.

I was referring to overhead on cluster operator to apply this settings . They will need to add either an attribute to all such nodes by defining this in their yml file and use this attribute in their cluster settings. The other option would be to specify nodes , which will be inconvenient .

@vinaykpud
Copy link
Contributor Author

vinaykpud commented Mar 7, 2025

@mch2 @Bukhtawar @gbbafna

I’ve responded to the concern here: #17457 (comment)

However, it seems we need to revisit our approach of using the inclusion filter to define a fleet of search nodes based on custom node attributes. While this approach allows users to select search nodes dynamically using any attribute, the key question is: How much practical value does this flexibility provide?

Concerns:

Concern 1: Complexity in Setting Up Searcher Fleet

The current approach requires a two-step process for users to configure a searcher fleet:

  1. Define a custom attribute for the nodes in the YML file.
  2. Set cluster.routing.allocation.search.replica.dedicated.include in the YML file or update it via the Cluster Settings API.

This additional complexity may not provide significant benefits compared to a simpler, more structured approach.

Concern 2: Impact on Awareness Allocation & Auto-Expand

Using a dynamic inclusion filter makes it difficult for Awareness Allocation and Auto-Expand implementations to identify and list search nodes. Since search nodes are determined based on filters, the only way to list them is by querying searchReplicaIncludeFilters in the SearchReplicaAllocationDecider

Relying on this indirect mechanism to determine search-dedicated nodes doesn’t seem ideal.

Proposed Solution: Predefined Node Attribute

A better approach would be to introduce a predefined custom attribute, such as:

node.attr.searchonly: "true"

Users can set this attribute on all nodes they want to designate as search-dedicated. Then, SearchReplicaAllocationDecider can use this attribute to:

  • Determine whether a node is eligible for search replica allocation.
  • Restrict non-search replicas from being assigned to these nodes.

Additionally, we can introduce a method in DiscoveryNode like:

boolean isSearchDedicatedNode()

This would provide a direct and reliable way to determine if a node is search-dedicated.

However, we should be cautious about potential confusion since we already have boolean isSearchNode(), which is based on node roles.

Also while doing this we dont want to add confusion since we have a similar method which is based on the node role.

Does this predefined node attribute approach look good? If you have any suggestions or alternative ideas, please let me know.

@gbbafna
Copy link
Collaborator

gbbafna commented Mar 7, 2025

@vinaykpud : Thanks for putting up this detailed proposal. The approach looks good and simplifies the experience for our users.

@Bukhtawar
Copy link
Collaborator

Any thoughts on introducing node roles?

@vinaykpud
Copy link
Contributor Author

We already have a Search Node role which was added for Searchable Snapshots, adding another role for this may add some confusion. What advantage we get if its Node Role vs attribute?

@mch2
Copy link
Member

mch2 commented Mar 7, 2025

Role or or attr would both work, we also have precedent of using role here with 'search' and the TargetPoolAllocationDecider.

Though a concern I have with role as Vinay mentioned is introducing a new role along side the existing "search" which is conceptually confusing to me. I think we're better off repurposing "search" to either hold warm or hot or both shard types.

  • Edit - @andrross corrected me here - we can mix search/hot already today with ['data','search'] as the roles - We would need to ensure we aren't reserving heap for file cache and set the cache size to 0 in settings.

This is what led me to the filter approach initially, you could use any data node attr or a new one and set it once, but I understand the drawbacks of the additional config.

@mch2
Copy link
Member

mch2 commented Mar 7, 2025

If a node only includes the existing search role we pre-allocate cache size to 80%. If the node also includes another role 'data' it sets the cache size to 0, but then blows up on startup requiring a nonzero value on the setting:

    private void initializeFileCache(Settings settings, CircuitBreaker circuitBreaker) throws IOException {
        if (DiscoveryNode.isSearchNode(settings) == false) {
            return;
        }

        String capacityRaw = NODE_SEARCH_CACHE_SIZE_SETTING.get(settings);
        logger.info("cache size [{}]", capacityRaw);
        if (capacityRaw.equals(ZERO)) {
            throw new SettingsException(
                "Unable to initialize the "
                    + DiscoveryNodeRole.SEARCH_ROLE.roleName()
                    + "-"
                    + DiscoveryNodeRole.DATA_ROLE.roleName()
                    + " node: Missing value for configuration "
                    + NODE_SEARCH_CACHE_SIZE_SETTING.getKey()
            );
        }

So to reuse the role we would need to allow an initial file cache size of 0 and for separation we would need to explicitly set the size to 0, change the default to 0 and require it set for searchable snapshots, or require both roles (ew). The cache size is not a dynamic setting either :/

I am thinking changing the default to 0 and requiring a nonzero value in order to hold ss shards makes the most sense here but that is a breaking change - @bugmakerrrrrr @andrross curious what you two think here as I've seen you in that code recently.

@mch2
Copy link
Member

mch2 commented Mar 7, 2025

I think we need to go with a new attr here for a separate reason - we would need exclusivity among roles. We are not supporting search only next to writeable shards on a single node. This brings issues around zonal awareness and auto expansion etc so we'd like to support only the extreme separation case at this point.

@andrross
Copy link
Member

andrross commented Mar 7, 2025

I think we need to go with a new attr here for a separate reason - we would need exclusivity among roles. We are not supporting search only next to writeable shards on a single node.

@mch2 I know 3.0 is coming up pretty quickly, but would it make any sense to make a breaking change and flip things around here? As in, change searchable snapshots/file cache behavior to work with a node attribute instead of the search role, then use the search role for this use case (which at least on the surface seems like it would make more sense).

@gbbafna
Copy link
Collaborator

gbbafna commented Mar 10, 2025

@andrross , @mch2 : Just adding more context , we are considering adding a warm role for warm indices, which will be kind of exclusive role to begin with (not co exist with data ) . Separate role will give us clear code for allocation and rebalancing on lines of RemoteShardsBalancer .

@mch2
Copy link
Member

mch2 commented Mar 10, 2025

@andrross Yeah I'm ok with making that swap. @gbbafna Could todays searchable snapshots simply be renamed to warm and hold both writeable warm / warm and ss based on add'l index level config?

@vinaykpud
Copy link
Contributor Author

vinaykpud commented Mar 10, 2025

@mch2 @gbbafna @andrross @Bukhtawar Based on the discussions so far, I am going to rename existing "Search role" to "Warm role". Then introduce/repurpose "Search role" for nodes to host Search Replicas.

@vinaykpud
Copy link
Contributor Author

vinaykpud commented Mar 11, 2025

@mch2 Here is my commit for renaming: https://github.com/opensearch-project/OpenSearch/compare/main...vinaykpud:OpenSearch:rw/rename-role?expand=1, I will create a PR based on this.

Also in this PR: #17457, I will introduce the Search Role for a node.

@mch2
Copy link
Member

mch2 commented Mar 11, 2025

@vinaykpud can you pls flag this as well for documentation - we need to make this very clear & discoverable so ppl upgrading with existing 'search' nodes know to update the role.

@vinaykpud
Copy link
Contributor Author

Added issue for documentation: opensearch-project/documentation-website#9392

@gbbafna
Copy link
Collaborator

gbbafna commented Mar 12, 2025

@vinaykpud : Can we verify if we upgrade the cluster from 2.x to 3.0 with this change, searchable snapshots continue to work ? There might be few nuances here like changing the cluster-manager first .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Performance
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

5 participants