Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [OPENSEARCH] [AWS] - Error while deleting snapshot on S3 #14523

Open
cuasso opened this issue Jun 24, 2024 · 4 comments
Open

[BUG] [OPENSEARCH] [AWS] - Error while deleting snapshot on S3 #14523

cuasso opened this issue Jun 24, 2024 · 4 comments
Labels
bug Something isn't working Storage:Snapshots

Comments

@cuasso
Copy link

cuasso commented Jun 24, 2024

Describe the bug

We are saving a custom snapshot to S3 on AWS. For this purpose, we set up a custom s3 repository and create a policy that takes a daily snapshot and saves it to s3.
We don't have any problem with the save process, but the delete process always throws an error. It's a strange error because we see that the previous snapshot was being deleted, but the process fail.
The error is:

{
    "message": "[2024-06-22T23:01:04Z]: Caught exception while deleting snapshot [daily-snapshots-2024-06-15t23:00:42-vouq72uv].",
    "cause": "[2024-06-22T23:01:04Z]: groupSize must be greater than 0 but was -15"
}

We communicate with our aws support and they said that there isn't a problem with s3 configuration.

Related component

Storage:Snapshots

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

We hope that the snapshots will be deleted and the process will be marked as successful.

Additional Details

  • Open Search Version 2.11

S3 repository

image

Open Search Snapshot policy

image

Creation/Deletion Status

image
@andrross
Copy link
Member

I believe that error message is coming from here, which suggests there is a bug here. Even if the cause is some sort of misconfiguration, the error should be something more meaningful. @cuasso Can you get us the full stack trace from the logs to help pin this down?

@gbbafna
Copy link
Collaborator

gbbafna commented Jun 27, 2024

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 ]

@cuasso : As @andrross also said , stack trace would be helpful to proceed .

@gbbafna gbbafna removed the untriaged label Jun 27, 2024
@gbbafna gbbafna moved this from 🆕 New to 🏗 In progress in Storage Project Board Jun 27, 2024
@gbbafna gbbafna moved this from 🏗 In progress to 🆕 New in Storage Project Board Jun 27, 2024
@kalinkini
Copy link

Hello everyone!
I'm facing the same issue with on-premise installation of OpenSearch version 2.17.1 and MinIO s3 storage.

It can be reproduced with a scenario described by @cuasso (using a snapshot policy) and with the steps described in this case on the OpenSearch forum (delete error if there is more than one snapshot in the repository).

I have two clusters (main and sandbox) with the same version 2.17.1 connected to two different MinIO clusters:

  • 1st one, which hase this issue, connected to MinIO version 2025-02-07T23-21-09Z
  • 2nd works well and connected to MinIO version 2024-04-06T05-26-02Z

To test, I connected my sandbox cluster to the problematic version of Minio and the error was reproduced.

I noticed that even if a snapshot is not present in OpenSearch after this delete attempt, its data still exists in the S3 storage. The last snapshot in the repository would be deleted without this error.

@andrross, @gbbafna, I would appreciate any help and can provide additional information.
Thank you.
Here is the stack trace from the Snapshot policy:

[2025-02-28T18:01:05,303][ERROR][o.o.i.s.e.SMStateMachine ] [opensearch-d1] Caught exception while deleting snapshot [test_hot_state_policy-2025-02-28t14:40:0
java.lang.IllegalArgumentException: groupSize must be greater than 0 but was -3
        at org.opensearch.action.support.GroupedActionListener.<init>(GroupedActionListener.java:66) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.repositories.blobstore.BlobStoreRepository.cleanupStaleIndices(BlobStoreRepository.java:2063) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.repositories.blobstore.BlobStoreRepository.cleanupStaleBlobs(BlobStoreRepository.java:1878) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.repositories.blobstore.BlobStoreRepository.cleanupUnlinkedRootAndIndicesBlobs(BlobStoreRepository.java:1461) ~[opensearch-2.17.1.jar
        at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$doDeleteShardSnapshots$11(BlobStoreRepository.java:1276) ~[opensearch-2.17.1.jar:2
        at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) ~[opensearch-2.17.1.jar:2.17.
        at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) ~[opensearch-2.17.1.jar:2.17.1]
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) ~[?:?]
        at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:160) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:141) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.action.StepListener.innerOnResponse(StepListener.java:79) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.core.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:58) ~[opensearch-core-2.17.1.jar:2.17.1]
        at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$doDeleteShardSnapshots$9(BlobStoreRepository.java:1256) ~[opensearch-2.17.1.jar:2.
        at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) ~[opensearch-2.17.1.jar:2.17.
        at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) ~[opensearch-2.17.1.jar:2.17.1]
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) ~[?:?]
        at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:160) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:141) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.action.StepListener.innerOnResponse(StepListener.java:79) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.core.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:58) ~[opensearch-core-2.17.1.jar:2.17.1]
        at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.17.1.jar:2.17.1]
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.17.1.jar:2.17.1]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005) ~[opensearch-2.17.1.jar:2.17.1
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.17.1.jar:2.17.1]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) ~[?:?]

@kalinkini
Copy link

Finally solved.
The S3 storage version was a false trail. For me, it was a misconfiguration in the repository registration:
The base_path parameter must not start with a slash symbol.

"base_path": "my/snapshot/directory"

PUT /_snapshot/my-opensearch-repo
{
  "type": "s3",
  "settings": {
    "bucket": "my-open-search-bucket",
    "base_path": "my/snapshot/directory"
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Snapshots
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants