 |
|
Page 1 of 1
|
[ 14 posts ] |
|
BUG: massive memory leak with AUFS/SquashFS loopback
| Author |
Message |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 BUG: massive memory leak with AUFS/SquashFS loopback
Verified using the latest 2009.04.04 version. Steps to reproduce: 0) It's possible to synthesize the testcase but an easy way to reproduce is to get Presto at http://prestomypc.com/. Install, boot, etc. 1) grab a big tar file, something like a chroot. Normal case: 2) unpack it into a regular ntfs-3g windows mount like /mnt/windows. 3) watch the memory of ntfs-3g in top or with pmap. 4) observe nothing unexpected. Bad case: 2) unpack it somewhere else like '/' 3) again watch memory. 4) observe it eventually eat up all the memory on the system. This is kind of an interesting setup which is probably why no one has seen this before. At boot, a windows partition is mounted with ntfs-3g. Then, a squashfs image on the windows drive is mounted as a loopback and another ext3 image is also mounted as a loopback. These two mountpoints become the read only and read write portions of an aufs mount on the root. It looks to me like ntfs-3g is doing some sort of caching that isn't getting released in the case of writing to the aufs layers. Note that just mounting a loopback image from a windows mount is insufficient to tickle this bug. It looks like there needs to be the aufs layering. It could be that there is some subtle bug in aufs whereby something isn't being released when it should be but I thought I would start here since you guys might have a better idea about when the caching happens. I'm willing to debug it on my end but it would be great to have some starting points. Thanks, Kris
|
| Mon Apr 27, 2009 22:14 |
|
 |
|
szaka
Tuxera CTO
Joined: Tue Nov 21, 2006 23:15 Posts: 1645
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
File systems do use caching to improve performance. NTFS-3G is not exception.The memory needs to be filled, otherwise it's wasted. More info: http://www.linuxhowtos.org/System/Linux%20Memory%20Management.htmThe difference is that the memory and the CPU use is directly visible for NTFS-3G in the process list but only in the system statistics for kernel file system. You would have problem if the memory weren't released when it's needed. Note, due to memory fragmentation the apparent virtual memory use can be still high. Several other distributions do what you do since 2007 (Knoppix, WUBI, ...). They never reported OOM problem.
|
| Mon Apr 27, 2009 23:36 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
That's exactly the problem. The memory is never released. It keeps consuming memory until it gets hammered by the oom-killer and everything comes to a screeching halt.
Like I said, I'm not convinced that the problem might not be with the version of aufs that we're using or perhaps fuse but I wanted to know if there was any particular place in the ntfs-3g driver where I could put some instrumentation.
cheers,
Kris
|
| Tue Apr 28, 2009 14:42 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
Just verified the same problem exists if you use a unionfs mount instead of an aufs.
cheers,
Kris
|
| Tue Apr 28, 2009 16:42 |
|
 |
|
szaka
Tuxera CTO
Joined: Tue Nov 21, 2006 23:15 Posts: 1645
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
That's indeed some progress.
But it's still unknown the exact distribution brand/version, kernel and fuse version, how the ntfs-3g driver was compiled (external or internal fuse?), what kernel, fuse and driver patches you use, what is the file system work pattern, how the driver is used, what the memory growth means in numbers, where are the related source codes, etc.
We regularly test for and monitor potential memory leaks and we are not aware of any. Nor we were reported by anybody else.
|
| Tue Apr 28, 2009 17:01 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
Tried building and linking ntfs-3g with the external fuse lib (2.7.4). Problem still exists with that configuration. That doesn't truly rule out that section of code since there could be sufficient similarities for a bug to be duplicated in both.
cheers,
Kris
|
| Tue Apr 28, 2009 17:06 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
szaka wrote: That's indeed some progress.
But it's still unknown the exact distribution brand/version, kernel and fuse version, how the ntfs-3g driver was compiled (external or internal fuse?), what kernel, fuse and driver patches you use, what is the file system work pattern, how the driver is used, what the memory growth means in numbers, where are the related source codes, etc.
We regularly test for and monitor potential memory leaks and we are not aware of any. Nor we were reported by anybody else. This is in the Presto product ( http://prestomypc.com/) which is a custom, quick-boot linux distro which installs in the Windows partition. It's basically a Debian Lenny with a newer kernel, xorg, xfce, etc. The kernel is fairly normal with some assorted fastboot patches as well as things like aufs. I might try a stock 2.6.29 kernel to see if any of these patches cause the problem. These latest tests have been conducted with the latest stable ntfs-3g (2009.4.4) with internal libfuse as well as external libfuse (2.7.4). The root filesystem is two loopback mounted images on the ntfs partition with a read/write ext3 image aufs/unionfs mounted on top of a read only squashfs image. The bug only presents itself when writing to the root filesystem and is independent of unioning type. Writing to the windows partition or a non-unioned loopback image does not show this problem. Sources are all compiled in a debootstrapped Lenny environment for clean builds. The memory usage seems quite large. The tarfile I'm using uncompresses to about 450MB and will not complete unpacking before using the entire 1GB of memory on the netbook I'm using for testing. Please let me know if I can give you any more information. I'm now going to start looking at the ntfs-3g sources to try to find where this is happening. Any tips on where to start looking? cheers, Kris
|
| Tue Apr 28, 2009 17:26 |
|
 |
|
szaka
Tuxera CTO
Joined: Tue Nov 21, 2006 23:15 Posts: 1645
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
It's constant 3.5 MB memory use during unpacking large tar files. pmap: Code: 24804: ntfs-3g /dev/sda3 /mnt/t 08048000 32K r-x-- /bin/ntfs-3g 08050000 4K rw--- /bin/ntfs-3g 08051000 1676K rw--- [ anon ] b7d4e000 140K rw--- [ anon ] b7d71000 1188K r-x-- /lib/i686/libc-2.4.so b7e9a000 4K r---- /lib/i686/libc-2.4.so b7e9b000 8K rw--- /lib/i686/libc-2.4.so b7e9d000 12K rw--- [ anon ] b7ea0000 64K r-x-- /lib/i686/libpthread-2.4.so b7eb0000 8K rw--- /lib/i686/libpthread-2.4.so b7eb2000 8K rw--- [ anon ] b7eb4000 252K r-x-- /lib/libntfs-3g.so.54.0.0 b7ef3000 4K rw--- /lib/libntfs-3g.so.54.0.0 b7f0f000 4K rw--- [ anon ] b7f10000 96K r-x-- /lib/ld-2.4.so b7f28000 4K r---- /lib/ld-2.4.so b7f29000 4K rw--- /lib/ld-2.4.so bff1e000 84K rw--- [ stack ] bfffe000 4K r-x-- [ anon ] total 3596K
The only I can imagine is listing a directory which has 10-50 million files on the first level. That may cause OOM on a 1 GB RAM box. But people usually don't have maximum more than 1-2 million files in a single directory. Typically not even entire volumes have so many files.
|
| Tue Apr 28, 2009 18:42 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
Under most circumstances, that's exactly what I see as well. Are you testing the same setup as I am?
ie
ntfs-3g /dev/sda1 /mnt-boot mount -t squashfs -o ro,loop /mnt-boot/some_squash.img /mnt-system mount -t ext3 -o rw,loop /mnt-boot/some_ext3.img /mnt-user mount -t aufs -o br:/mnt-user:/mnt-system user /mnt
cheers,
Kris
|
| Tue Apr 28, 2009 18:56 |
|
 |
|
szaka
Tuxera CTO
Joined: Tue Nov 21, 2006 23:15 Posts: 1645
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
What's the output of your pmap?
|
| Tue Apr 28, 2009 18:58 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
Here's the output of 'while true ; do date ; pmap `pgrep ntfs-3g` ; echo ; sleep 1 ; done' when unpacking a tar file:
Tue Apr 28 13:14:58 UTC 2009 955: ntfs-3g -o syncio /dev/sda1 /mnt-boot 08048000 804K r-x-- /bin/ntfs-3g (deleted) 08111000 12K rw--- /bin/ntfs-3g (deleted) 08114000 79016K rw--- [ anon ] b7ed4000 268K rw--- [ anon ] b7f17000 4K r-x-- [ anon ] bfc03000 84K rw--- [ stack ] total 80188K
Tue Apr 28 13:14:59 UTC 2009 955: ntfs-3g -o syncio /dev/sda1 /mnt-boot 08048000 804K r-x-- /bin/ntfs-3g (deleted) 08111000 12K rw--- /bin/ntfs-3g (deleted) 08114000 93272K rw--- [ anon ] b7ed4000 268K rw--- [ anon ] b7f17000 4K r-x-- [ anon ] bfc03000 84K rw--- [ stack ] total 94444K
Tue Apr 28 13:15:01 UTC 2009 955: ntfs-3g -o syncio /dev/sda1 /mnt-boot 08048000 804K r-x-- /bin/ntfs-3g (deleted) 08111000 12K rw--- /bin/ntfs-3g (deleted) 08114000 106604K rw--- [ anon ] b7ed4000 268K rw--- [ anon ] b7f17000 4K r-x-- [ anon ] bfc03000 84K rw--- [ stack ] total 107776K
Tue Apr 28 13:15:02 UTC 2009 955: ntfs-3g -o syncio /dev/sda1 /mnt-boot 08048000 804K r-x-- /bin/ntfs-3g (deleted) 08111000 12K rw--- /bin/ntfs-3g (deleted) 08114000 113012K rw--- [ anon ] b7ed4000 268K rw--- [ anon ] b7f17000 4K r-x-- [ anon ] bfc03000 84K rw--- [ stack ] total 114184K
Tue Apr 28 13:15:03 UTC 2009 955: ntfs-3g -o syncio /dev/sda1 /mnt-boot 08048000 804K r-x-- /bin/ntfs-3g (deleted) 08111000 12K rw--- /bin/ntfs-3g (deleted) 08114000 115240K rw--- [ anon ] b7ed4000 268K rw--- [ anon ] b7f17000 4K r-x-- [ anon ] bfc03000 84K rw--- [ stack ] total 116412K
This will keep growing and never release. The reason it shows up as deleted is that it's a static binary in the initramfs so by this point it's gone.
cheers,
Kris
|
| Tue Apr 28, 2009 19:18 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
Interesting. I just went to reproduce this on a different system which has a bit older kernel among other things. It's the 'System Rescue CD' and it has older versions of just about everything, 2.6.27 kernel, etc. Did the same thing - ntfs partition, squash image, aufs, etc. and ntfs-3g performs perfectly. Now I'm going to try the same experiment on the latest ubuntu with newer tools. We may still be looking at some sort of regression but I can't rule out that maybe we're doing something dumb on our end.
cheers,
Kris
|
| Tue Apr 28, 2009 19:35 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
This just keeps getting more mysterious all the time.
The test case given above of making the image files on a windows dir and doing all the mounts has failed to reproduce the problem.
I tried it on the system rescue cd (2.6.27), ubuntu 9.04 (2.6.28) AND tried it with presto (2.6.29) booted from a usb key and ntfs-3g did not have the leak in any of those cases.
I'm starting to wonder if this is peculiar to the situation where we've booted up and then caused the partitions to 'disappear' via the 'switch_root' in our init script. I'm going to boot to the one installed on the hard drive and try to reproduce it with my test case.
cheers,
Kris
|
| Tue Apr 28, 2009 20:19 |
|
 |
|
kewarken
Joined: Mon Apr 27, 2009 21:47 Posts: 10
|
 Re: BUG: massive memory leak with AUFS/SquashFS loopback
Here's an interesting bit of info.
I was about to do another test and went to copy my squash image in the windows partition and got some sort of error about creating a file. So I decided to boot into windows and see what was going on. Hmm...corrupted filesystem. The directory I had been working with was unreadable as far as windows was concerned.
Rebooted and did a chkdsk and it recovered all the files in that directory just fine. Went back into Presto and lo and behold, now the problem doesn't happen anymore.
So what's really interesting is:
a) with all the booting into different kernels, different ntfs-3g version, various tests, etc, everything seemed to keep working mostly fine from the Linux point of view. b) the massive memory growth seems to be a result of ntfs-3g dealing with whatever corruption existed in the ntfs partition c) the real problem now becomes: how did the filesystem get into this state in the first place and how can I prevent it from happening again? We've got customer reports of similar problems in the field so I have to assume it wasn't unique to my machine.
I'm going to have to look at our shutdown/boot sequence. Perhaps there is something being done which is not clean.
cheers,
Kris
|
| Tue Apr 28, 2009 20:36 |
|
|
|
Page 1 of 1
|
[ 14 posts ] |
|
Who is online |
Users browsing this forum: No registered users and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|
 |