-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Loss After Restarting All Leaders and Followers in Redis Cluster #1164
Comments
@drivebyer have u faced this? |
No. Did you use RDB or AOF? |
I’m using AOF.
…On Tue, 24 Dec 2024 at 8:59 AM, yangw ***@***.***> wrote:
@drivebyer <https://github.com/drivebyer> have u faced this?
No. Did you use RDB or AOF?
—
Reply to this email directly, view it on GitHub
<#1164 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APZA5LWOJP5VZICN5YOKFBL2HDIK3AVCNFSM6AAAAABTO7POZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRQGU4TEOJQG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@drivebyer have u check this? |
@drivebyer @captainpro-eng Hi, I am facing similar issue. I have set specStorage.KeepAfterDelete: true to persist the PVCs. |
Can u share the redis cluster yaml?
…On Mon, 3 Feb 2025 at 8:07 PM, Qadora ***@***.***> wrote:
@drivebyer <https://github.com/drivebyer> @captainpro-eng
<https://github.com/captainpro-eng> Hi, I am facing similar issue. I have
set specStorage.KeepAfterDelete: true to persist the PVCs.
After a helm uninstall and reinstall, the data is lost. I can tell that
the data is lost when the Sync happens between master and replica (before
the state of the cluster changes to Ok).
I am using the latest version of both the operator and the RedisCluster
—
Reply to this email directly, view it on GitHub
<#1164 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APZA5LTBU422U47EPTYMJ4D2N55KBAVCNFSM6AAAAABTO7POZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZRGE4DEOBSG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sure thing!
|
Hi Please try this
clusterVersion: v6
U face the issue u r not using cluster-announce features
…On Mon, 3 Feb 2025 at 8:17 PM, Qadora ***@***.***> wrote:
Sure thing!
---
redisCluster:
name: "redis-cluster"
clusterSize: 3
clusterVersion: v7
persistenceEnabled: true
image: op-test
tag: latest
imagePullPolicy: IfNotPresent
imagePullSecrets: {}
# - name: Secret with Registry credentials
redisSecret:
secretName: "redis-password"
secretKey: "password"
resources:
requests:
cpu: 400m
memory: 1Gi
limits:
cpu: 400m
memory: 2Gi
minReadySeconds: 0
# -- Some fields of statefulset are immutable, such as volumeClaimTemplates.
# When set to true, the operator will delete the statefulset and recreate it. Default is false.
recreateStatefulSetOnUpdateInvalid: false
leader:
replicas: 3
serviceType: ClusterIP
affinity: {}
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: disktype
# operator: In
# values:
# - ssd
tolerations: []
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
nodeSelector: null
# memory: medium
securityContext: {}
pdb:
enabled: false
maxUnavailable: 1
minAvailable: 1
follower:
replicas: 3
serviceType: ClusterIP
affinity: null
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: disktype
# operator: In
# values:
# - ssd
tolerations: []
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
nodeSelector: null
# memory: medium
securityContext: {}
pdb:
enabled: false
maxUnavailable: 1
minAvailable: 1
labels: {}
# foo: bar
# test: echo
externalConfig:
enabled: true
data: |
loadmodule /FalkorDB/bin/src/falkordb.so
externalService:
enabled: false
# annotations:
# foo: bar
serviceType: LoadBalancer
port: 6379
serviceMonitor:
enabled: false
interval: 30s
scrapeTimeout: 10s
namespace: monitoring
# -- extraLabels are added to the servicemonitor when enabled set to true
extraLabels: {}
# foo: bar
# team: devops
redisExporter:
enabled: false
image: quay.io/opstree/redis-exporter
tag: "v1.44.0"
imagePullPolicy: IfNotPresent
resources: {}
# requests:
# cpu: 100m
# memory: 128Mi
# limits:
# cpu: 100m
# memory: 128Mi
env: []
# - name: VAR_NAME
# value: "value1"
sidecars:
name: ""
image: ""
imagePullPolicy: "IfNotPresent"
resources:
limits:
cpu: "100m"
memory: "128Mi"
requests:
cpu: "50m"
memory: "64Mi"
env: {}
# - name: MY_ENV_VAR
# value: "my-env-var-value"
initContainer:
enabled: false
image: ""
imagePullPolicy: "IfNotPresent"
resources: {}
# requests:
# memory: "64Mi"
# cpu: "250m"
# limits:
# memory: "128Mi"
# cpu: "500m"
env: []
command: []
args: []
priorityClassName: ""
storageSpec:
keepAfterDelete: true
volumeClaimTemplate:
spec:
# storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
nodeConfVolume: true
nodeConfVolumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
# selector: {}
podSecurityContext:
runAsUser: 1000
fsGroup: 1000
# serviceAccountName: redis-sa
TLS:
ca: ca.crt
cert: tls.crt
key: tls.key
secret:
secretName: ""
acl:
secret:
secretName: ""
env: []
# - name: VAR_NAME
# value: "value1"
serviceAccountName: ""
—
Reply to this email directly, view it on GitHub
<#1164 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APZA5LXE4HOAG7QYBTTSITL2N56PLAVCNFSM6AAAAABTO7POZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZRGIYDQOBTGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@captainpro-eng I am not sure I understand what announceme option is. |
@drivebyer I have an update on this, it seems the Redis operator does a FLUSHALL in the k8sutils/redis.go if the CLUSTER RSET command fails (When masters have keys and are not empty) and this add a FLUSHALL at the end of the appendonly.aof1.incr file also. |
Hi @MuhammadQadora, In my case, since v7 does not automatically include these options in the entry point, the pod fails to join the cluster after a restart because the Redis node doesn't announce its new IP to the cluster. To fix this, I have to use v6 and enable cluster announce so the pod can join through the host. I’m using the redis:v7.0.12 image, where the cluster announce functionality is not included by default in that version. Here is the entrypoint.sh in the master branch:
For Redis version 7.0.12, the entry point doesn't automatically add cluster-announce-ip and cluster-announce-hostname, which is why the Redis node fails to announce its IP to the cluster after a restart. Here's the entry point logic for v7.0.12:
Here’s the service definition I’m using for the Redis Cluster with Redis Operator version 0.18.5:
|
We are currently running a Redis cluster with the following versions:
Redis Operator Helm Chart: 18.0.5
Redis Operator Image: 18.0.1
Redis Image: 7.0.12
We have tested multiple Redis failover scenarios, and in most cases, the cluster state is marked as "OK" and data is preserved after restarts. However, we encountered a scenario where restarting all the masters and slaves results in the cluster state being "OK", but all data is lost. Below is a summary of the test cases:
Tested Scenarios:
Restarted 3 leaders only: After all leaders came back up, the cluster state became "OK" and all data was preserved.
Restarted all the leaders : After all leaders came back up, the cluster state became "OK" and all data was secure.
Restarted all the followers : After all followers came back up, the cluster state became "OK" and all data was secure.
Restarted 3 leaders and 3 followers : After all leaders and followers came back up, the cluster state became "OK" and all data was secure.
Restarted all the leaders and all the followers: After all masters and slaves came back up, the cluster state was "OK", but all data was lost.
The text was updated successfully, but these errors were encountered: