-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak while obtaining data from DataLoader #5
Comments
@kzhang0718 your intuition is correct. Unfortunately openslide was never optimized for such large-scale AI applications. See related threads: One possible solution which is to change the source code of openslide so that cache is disabled. For example GeertLitjens 's (https://github.com/GeertLitjens) fix works well. You will have then to compile the code yourself. Hope this was helpful. |
@gabricampanella Thanks, that's very informative. I figured it must have to do with the cache but I have not thought about disabling it from the openslide source code. I modified the DataSet class so that WSI file handles are open and closed on the fly, memory looked fine but performance dropped significantly. I'll definitely try disabling the cache. On a different note, the code doesn't look very scalable. I'm only working with about 200 WSIs so it's not a big issue at the moment. But when I get to thousands or even tens of thousands of WSIs training would take forever. I wonder how you managed to train on over 30k (something like that) WSIs for your Nature Medicine's work. Would appreciate it if you could share some experience. |
@kzhang0718 You are right that given this set-up, scalability will become an issue as you hit the tens of thousands. Solving that caching issue will alleviate the problem a bit and will allow you to hit that mark. To go beyond there are many things that can be done both on the algorithm side (for example being smarter about the inference stage) and on the framework side (custom made data loading). To give you some context, 10k prostate core biopsies were trained in about a week. The breast dataset which is composed of much larger tissue samples took one month. |
@gabricampanella Great, thanks for the info! I'll see what I can do when it hits that mark. Closing this thread. |
Many thanks for sharing this exciting methodology. I am having the same problem. I was wondering if you could advise how to access GeertLitjens's fix. Is that what you used for the Nature Medicine paper? |
I used tiffslide instead which does not have this problem. Unfortunately currently it cannot be used with multiprocessing Bayer-Group/tiffslide#18 |
Hello, thanks for the great work.
There seems to be a memory issue when acquiring tile images from the DataSet. Memory increases linearly every iteration while eventually eats up the entire RAM. From the code it doesn't look like anything should be taking up that much memory. Any advice how to get around this? Seems to have to do with openslide.
The text was updated successfully, but these errors were encountered: