Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Shard may not be allocated because of SameShardAllocationDecider and total_shards_per_node #12957

Closed
kkewwei opened this issue Mar 28, 2024 · 8 comments

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Mar 28, 2024

Describe the bug

Shard will not be allocated because of SameShardAllocationDecider And total_shards_per_node, result of _cluster/allocation/explain is as follows:

{
  "index": "index1",
  "shard": 3,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "INDEX_CREATED",
    "at": "2024-03-28T00:43:38.887Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "Ju7Xg9BkTGObxOqh_7rkKw",
      "node_name": "data1",
      "transport_address": "ip1:9300",
      "node_attributes": {
        "data_node": "hot",
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "a copy of this shard is already allocated to this node [[index1][3], node[Ju7Xg9BkTGObxOqh_7rkKw], [P], s[STARTED], a[id=3nMgsBvlQRqxz6xwJdEavQ]]"
        }
      ]
    },
    {
      "node_id": "A52CflzwSwSB8GqRQQyeGQ",
      "node_name": "data2",
      "transport_address": "ip2:9300",
      "node_attributes": {
        "data_node": "hot",
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "weight_ranking": 2,
      "deciders": [
        {
          "decider": "shards_limit",
          "decision": "NO",
          "explanation": "too many shards [2] allocated to this node for index [index1], index setting [index.routing.allocation.total_shards_per_node=2]"
        }
      ]
    },
    {
      "node_id": "hGviQg2nRhWJK-XpKulk-g",
      "node_name": "data3",
      "transport_address": "ip3:9300",
      "node_attributes": {
        "data_node": "hot",
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "weight_ranking": 3,
      "deciders": [
        {
          "decider": "shards_limit",
          "decision": "NO",
          "explanation": "too many shards [2] allocated to this node for index [index1], index setting [index.routing.allocation.total_shards_per_node=2]"
        }
      ]
    },
    {
      "node_id": "zaQfLmTNSmuEXlwRXguifw",
      "node_name": "data3",
      "transport_address": "ip3:9300",
      "node_attributes": {
        "data_node": "hot",
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "weight_ranking": 4,
      "deciders": [
        {
          "decider": "shards_limit",
          "decision": "NO",
          "explanation": "too many shards [2] allocated to this node for index [index1], index setting [index.routing.allocation.total_shards_per_node=2]"
        }
      ]
    }
  ]
}

unassigned_shard

I've encountered this several times, If we should optimize shard allocation, or actively migrate other shards to the node to break this unhealthy balance?

Related component

Cluster Manager

Host/Environment (please complete the following information):

  • Version: opensearch2.9
@kkewwei kkewwei added bug Something isn't working untriaged labels Mar 28, 2024
@kkewwei kkewwei changed the title [BUG] Shard may not be allocated because of SameShardAllocationDecider And total_shards_per_node [BUG] Shard may not be allocated because of SameShardAllocationDecider and total_shards_per_node Mar 28, 2024
@shwetathareja
Copy link
Member

@kkewwei : Thanks for filing the issue. Currently, these AllocationDeciders are run independently and hence these issues happen when they start interfering with each other. I agree anytime there are unassigned shards due to deciders which are not transient, the rebalancer needs to find the next optimal routing by moving around shard.

@kkewwei
Copy link
Contributor Author

kkewwei commented Apr 2, 2024

@shwetathareja , If we should add AllocationDeciders to deal with this unassinged shards? If you agree, I'd be happy to try to implement it. Or do you have other ideas, I'd be happy to ask.

@shwetathareja
Copy link
Member

@kkewwei feel free to take a stab at the solution, will be happy to review the PR.

@ViggoC
Copy link
Contributor

ViggoC commented Jun 7, 2024

@kkewwei The conflict of AllocationDeciders(hard constraint) will cause the unassigned, I think AllocationConstraint( soft constraint) is a better choice for index shard balance. You can just disable the total_shards_per_node decider and rely on INDEX_SHARD_PER_NODE_BREACH_CONSTRAINT.

@kkewwei
Copy link
Contributor Author

kkewwei commented Jun 23, 2024

@ViggoC, I get what you meaning, just delete index.routing.allocation.total_shards_per_node, and use INDEX_SHARD_PER_NODE_BREACH_CONSTRAINT_ID to make sure the shards are evenly allocated to each node.

But it seems don't solve the such case: the index with 3 shards and 1 replicas:
-------|-----------|-------------
node0 | shard0(p) | shard1(r)
-------|-----------|-------------
node1 | shard1(p) | shard0(r)
-------|-----------|-------------
node2 | shard2(p) |
-------|-----------|-------------

Now only the shard2(r) is not allocated, even with INDEX_SHARD_PER_NODE_BREACH_CONSTRAINT_ID, shard2(r) can't be allocated.

@ViggoC
Copy link
Contributor

ViggoC commented Jun 24, 2024

@kkewwei I this case, shard2(r) can be allocated to node0 or node1 with INDEX_SHARD_PER_NODE_BREACH_CONSTRAINT_ID, then rebalance step can relocate the other shard on the node to node2

@kkewwei
Copy link
Contributor Author

kkewwei commented Jun 24, 2024

@ViggoC My understanding: In this case, the shards count in node0, node2 is 2, INDEX_SHARD_PER_NODE_BREACH_CONSTRAINT_ID will no longer works.

Can you explain "shard2(r) can be allocated to node0 or node1 with INDEX_SHARD_PER_NODE_BREACH_CONSTRAINT_ID, then rebalance step can relocate the other shard on the node to node2", I am confused about this process, very thank you.

@ViggoC
Copy link
Contributor

ViggoC commented Jun 24, 2024

@kkewwei SameShardAllocationDecider will reject the allocation of shard2(r) in node2, so the node0 and node1 are the candidates. And INDEX_SHARD_PER_NODE_BREACH_CONSTRAINT is a soft constraint, it will not reject the allocation but just add a very big weight, eg. 1000000, but shard2(r) can still be allocated to one of them, because the ShardAllocationDecision is YES.

@kkewwei kkewwei closed this as completed Nov 16, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Shard Management Project Board Nov 16, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Cluster Manager Project Board Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Status: ✅ Done
Development

No branches or pull requests

3 participants