Memory leak while obtaining data from DataLoader #5

kzhang0718 · 2019-09-02T02:16:03Z

Hello, thanks for the great work.

There seems to be a memory issue when acquiring tile images from the DataSet. Memory increases linearly every iteration while eventually eats up the entire RAM. From the code it doesn't look like anything should be taking up that much memory. Any advice how to get around this? Seems to have to do with openslide.

gabricampanella · 2019-09-02T15:56:57Z

@kzhang0718 your intuition is correct. Unfortunately openslide was never optimized for such large-scale AI applications. See related threads:
openslide/openslide-python#24
openslide/openslide#38

One possible solution which is to change the source code of openslide so that cache is disabled. For example GeertLitjens 's (https://github.com/GeertLitjens) fix works well. You will have then to compile the code yourself.

Hope this was helpful.

kzhang0718 · 2019-09-04T08:57:34Z

@gabricampanella Thanks, that's very informative. I figured it must have to do with the cache but I have not thought about disabling it from the openslide source code. I modified the DataSet class so that WSI file handles are open and closed on the fly, memory looked fine but performance dropped significantly. I'll definitely try disabling the cache.

On a different note, the code doesn't look very scalable. I'm only working with about 200 WSIs so it's not a big issue at the moment. But when I get to thousands or even tens of thousands of WSIs training would take forever. I wonder how you managed to train on over 30k (something like that) WSIs for your Nature Medicine's work. Would appreciate it if you could share some experience.

gabricampanella · 2019-09-04T16:09:05Z

@kzhang0718 You are right that given this set-up, scalability will become an issue as you hit the tens of thousands. Solving that caching issue will alleviate the problem a bit and will allow you to hit that mark. To go beyond there are many things that can be done both on the algorithm side (for example being smarter about the inference stage) and on the framework side (custom made data loading). To give you some context, 10k prostate core biopsies were trained in about a week. The breast dataset which is composed of much larger tissue samples took one month.

kzhang0718 · 2019-09-06T01:59:32Z

@gabricampanella Great, thanks for the info! I'll see what I can do when it hits that mark.

Closing this thread.

samkleeman1 · 2020-04-17T17:06:11Z

Many thanks for sharing this exciting methodology. I am having the same problem. I was wondering if you could advise how to access GeertLitjens's fix. Is that what you used for the Nature Medicine paper?

aamster · 2022-10-04T15:09:35Z

I used tiffslide instead which does not have this problem. Unfortunately currently it cannot be used with multiprocessing Bayer-Group/tiffslide#18

kzhang0718 closed this as completed Sep 6, 2019

Tato14 mentioned this issue Dec 27, 2019

RuntimeError: Dataloader worker is killed by signal: Killed #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak while obtaining data from DataLoader #5

Memory leak while obtaining data from DataLoader #5

kzhang0718 commented Sep 2, 2019

gabricampanella commented Sep 2, 2019 •

edited

Loading

kzhang0718 commented Sep 4, 2019

gabricampanella commented Sep 4, 2019

kzhang0718 commented Sep 6, 2019

samkleeman1 commented Apr 17, 2020

aamster commented Oct 4, 2022

Memory leak while obtaining data from DataLoader #5

Memory leak while obtaining data from DataLoader #5

Comments

kzhang0718 commented Sep 2, 2019

gabricampanella commented Sep 2, 2019 • edited Loading

kzhang0718 commented Sep 4, 2019

gabricampanella commented Sep 4, 2019

kzhang0718 commented Sep 6, 2019

samkleeman1 commented Apr 17, 2020

aamster commented Oct 4, 2022

gabricampanella commented Sep 2, 2019 •

edited

Loading