Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes are suddenly empty #151

Open
woehrl01 opened this issue Feb 22, 2024 · 12 comments
Open

Volumes are suddenly empty #151

woehrl01 opened this issue Feb 22, 2024 · 12 comments
Assignees
Labels
work-in-progress Someone is working on this already

Comments

@woehrl01
Copy link

woehrl01 commented Feb 22, 2024

Hi,

In our test environment we detected that suddenly the content of volumes are empty. A recreation of the pod make the files reappear. This often happens after the pod is running for multiple hours. We are using containerd with default configuration. I assume this has something to do with the garbage collection of the underlying node. The volumes are mounted es ephemeral csi volumes in readOnly mode. So a deletion of the files by accident can be excluded as a root cause.

Did someone had a similar experience? Any ideas how to troubleshoot that further?

@woehrl01
Copy link
Author

I'm not yet deep into the individual technology internals, could it be that we miss lease handling in the containerd case? Otherwise resources are automatically garbage collected after 24h?

See: https://github.com/containerd/containerd/blob/main/docs%2Fgarbage-collection.md#L19-L93

@mugdha-adhav
Copy link
Collaborator

We heavily use this driver in our production environment and haven't noticed this issue before. Our environment also uses containerd version v1.7.2 on EKS 1.26 nodes.

@woehrl01
Copy link
Author

@mugdha-adhav I see, maybe our usage pattern is different. We are spinning up and tear down pods very frequently often the churn is less then 5 seconds.

I have the assumption that there is a problem with the metadata update in that scenario. In combination with sudden kills because of the failing livenessProbe which results deleting the snapshot even though it is still in use by the last pod, resulting in an empty dir.

Additionally to that, the gc of kubelet kicks in regular on our nodes.

That's why I have created to linked PR to try using leases instead of metadata. I'm still trying to find a reliable reproducable example.

@woehrl01
Copy link
Author

woehrl01 commented Feb 28, 2024

@mugdha-adhav I could now reproduce this error multiple times with the latest version 1.1.0 under very high load (starting/stopping hundreds of pods with a mix of shared/non-shared volumes). I'm currently preparing a PR to make the driver handle high load scenarios more graceful. The PR is meant as a discussion basis, of what you want to include in this project.

@mugdha-adhav mugdha-adhav added the work-in-progress Someone is working on this already label Feb 29, 2024
@woehrl01
Copy link
Author

@mugdha-adhav When testing our service at large scale with bigger images and different images, I can see an exhaustion of containerd bringing all the pods on a node to a halt. While my changes are fixing the exhaustion of containerd when all images are available, there is a needed modification necessary on the pulling side (to fix the case of scheduling many pods on a fresh node). The PR #137 looks quite promising to fix the exhaustion of containerd, that means I'm waiting on that to get merged first.

@mugdha-adhav
Copy link
Collaborator

@woehrl01 we mostly use read-write volumes on our clusters, so this issue might be limited to read-only volumes.

I tried reproducing the issue by deleting the image and snapshot from containerD, but I could still see the files mounted in the pod. Hence I doubt if the issue is related to garbage-collection in containerD.

@woehrl01
Copy link
Author

@mugdha-adhav Yes, I think this is only related to read-only volumes. During my migration to leases as a GC mechanism, I could verify that DeleteSnapshot has been called, even though there were still leases attached to it.

@imuni4fun
Copy link
Contributor

another thought occurred to me: we should verify that mounting the pulled image as writeable does indeed create a new writeable layer (UFS, the way Dockerfile steps represent changes as add/modify/delete ON TOP of previous fs layer) such that two pods mounting in read/write mode cannot change each other's version of that image.

#137 (which should merge tomorrow, btw) reduces parallel pulls to a single request so the resulting on-disk representation would be from a single pull-and-unpack. i'm not sure if the mounting activity (this CSI provider calling into containerd/crio) starts from this unpacked content and creates that new writeable layer to maintain immutability.

@mugdha-adhav
Copy link
Collaborator

we should verify that mounting the pulled image as writeable does indeed create a new writeable layer

I have already verified this manually by checking the snapshots created by containerD for every new writable volume.

@woehrl01
Copy link
Author

@mugdha-adhav I could reverify that and the problem is happing if there are timeouts happing causing some operations to fail, during rollback of those changes the snapshots are deleted.

@mugdha-adhav
Copy link
Collaborator

@woehrl01 are we good to close this issue based on reasoning mentioned in this comment?

@woehrl01
Copy link
Author

@mugdha-adhav no this issue can still happen. I didn't had time to push my changes upstream yet and create a pr to fix this eventually. I still have this in my todo list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
work-in-progress Someone is working on this already
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants