Register FAQ SearchLogin
Tuxera Home
View unanswered posts | View active topics It is currently Sat May 25, 2013 11:13



Post new topic Reply to topic  [ 14 posts ] 
BUG: massive memory leak with AUFS/SquashFS loopback 
Author Message

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post BUG: massive memory leak with AUFS/SquashFS loopback
Verified using the latest 2009.04.04 version.

Steps to reproduce:

0) It's possible to synthesize the testcase but an easy way to reproduce is to get Presto at http://prestomypc.com/. Install, boot, etc.
1) grab a big tar file, something like a chroot.

Normal case:
2) unpack it into a regular ntfs-3g windows mount like /mnt/windows.
3) watch the memory of ntfs-3g in top or with pmap.
4) observe nothing unexpected.

Bad case:
2) unpack it somewhere else like '/'
3) again watch memory.
4) observe it eventually eat up all the memory on the system.

This is kind of an interesting setup which is probably why no one has seen this before.

At boot, a windows partition is mounted with ntfs-3g.

Then, a squashfs image on the windows drive is mounted as a loopback and another ext3 image is also mounted as a loopback.

These two mountpoints become the read only and read write portions of an aufs mount on the root.

It looks to me like ntfs-3g is doing some sort of caching that isn't getting released in the case of writing to the aufs layers. Note that just mounting a loopback image from a windows mount is insufficient to tickle this bug. It looks like there needs to be the aufs layering.

It could be that there is some subtle bug in aufs whereby something isn't being released when it should be but I thought I would start here since you guys might have a better idea about when the caching happens. I'm willing to debug it on my end but it would be great to have some starting points.

Thanks,

Kris


Mon Apr 27, 2009 22:14
Profile
Tuxera CTO

Joined: Tue Nov 21, 2006 23:15
Posts: 1645
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
File systems do use caching to improve performance. NTFS-3G is not exception.The memory needs to be filled, otherwise it's wasted. More info: http://www.linuxhowtos.org/System/Linux%20Memory%20Management.htm

The difference is that the memory and the CPU use is directly visible for NTFS-3G in the process list but only in the system statistics for kernel file system.

You would have problem if the memory weren't released when it's needed. Note, due to memory fragmentation the apparent virtual memory use can be still high.

Several other distributions do what you do since 2007 (Knoppix, WUBI, ...). They never reported OOM problem.


Mon Apr 27, 2009 23:36
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
That's exactly the problem. The memory is never released. It keeps consuming memory until it gets hammered by the oom-killer and everything comes to a screeching halt.

Like I said, I'm not convinced that the problem might not be with the version of aufs that we're using or perhaps fuse but I wanted to know if there was any particular place in the ntfs-3g driver where I could put some instrumentation.

cheers,

Kris


Tue Apr 28, 2009 14:42
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
Just verified the same problem exists if you use a unionfs mount instead of an aufs.

cheers,

Kris


Tue Apr 28, 2009 16:42
Profile
Tuxera CTO

Joined: Tue Nov 21, 2006 23:15
Posts: 1645
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
That's indeed some progress.

But it's still unknown the exact distribution brand/version, kernel and fuse version, how the ntfs-3g driver was compiled (external or internal fuse?), what kernel, fuse and driver patches you use, what is the file system work pattern, how the driver is used, what the memory growth means in numbers, where are the related source codes, etc.

We regularly test for and monitor potential memory leaks and we are not aware of any. Nor we were reported by anybody else.


Tue Apr 28, 2009 17:01
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
Tried building and linking ntfs-3g with the external fuse lib (2.7.4). Problem still exists with that configuration. That doesn't truly rule out that section of code since there could be sufficient similarities for a bug to be duplicated in both.

cheers,

Kris


Tue Apr 28, 2009 17:06
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
szaka wrote:
That's indeed some progress.

But it's still unknown the exact distribution brand/version, kernel and fuse version, how the ntfs-3g driver was compiled (external or internal fuse?), what kernel, fuse and driver patches you use, what is the file system work pattern, how the driver is used, what the memory growth means in numbers, where are the related source codes, etc.

We regularly test for and monitor potential memory leaks and we are not aware of any. Nor we were reported by anybody else.


This is in the Presto product (http://prestomypc.com/) which is a custom, quick-boot linux distro which installs in the Windows partition. It's basically a Debian Lenny with a newer kernel, xorg, xfce, etc.

The kernel is fairly normal with some assorted fastboot patches as well as things like aufs. I might try a stock 2.6.29 kernel to see if any of these patches cause the problem.

These latest tests have been conducted with the latest stable ntfs-3g (2009.4.4) with internal libfuse as well as external libfuse (2.7.4).

The root filesystem is two loopback mounted images on the ntfs partition with a read/write ext3 image aufs/unionfs mounted on top of a read only squashfs image. The bug only presents itself when writing to the root filesystem and is independent of unioning type. Writing to the windows partition or a non-unioned loopback image does not show this problem.

Sources are all compiled in a debootstrapped Lenny environment for clean builds.

The memory usage seems quite large. The tarfile I'm using uncompresses to about 450MB and will not complete unpacking before using the entire 1GB of memory on the netbook I'm using for testing.

Please let me know if I can give you any more information. I'm now going to start looking at the ntfs-3g sources to try to find where this is happening. Any tips on where to start looking?

cheers,

Kris


Tue Apr 28, 2009 17:26
Profile
Tuxera CTO

Joined: Tue Nov 21, 2006 23:15
Posts: 1645
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
It's constant 3.5 MB memory use during unpacking large tar files. pmap:
Code:
24804:   ntfs-3g /dev/sda3 /mnt/t
08048000     32K r-x--  /bin/ntfs-3g
08050000      4K rw---  /bin/ntfs-3g
08051000   1676K rw---    [ anon ]
b7d4e000    140K rw---    [ anon ]
b7d71000   1188K r-x--  /lib/i686/libc-2.4.so
b7e9a000      4K r----  /lib/i686/libc-2.4.so
b7e9b000      8K rw---  /lib/i686/libc-2.4.so
b7e9d000     12K rw---    [ anon ]
b7ea0000     64K r-x--  /lib/i686/libpthread-2.4.so
b7eb0000      8K rw---  /lib/i686/libpthread-2.4.so
b7eb2000      8K rw---    [ anon ]
b7eb4000    252K r-x--  /lib/libntfs-3g.so.54.0.0
b7ef3000      4K rw---  /lib/libntfs-3g.so.54.0.0
b7f0f000      4K rw---    [ anon ]
b7f10000     96K r-x--  /lib/ld-2.4.so
b7f28000      4K r----  /lib/ld-2.4.so
b7f29000      4K rw---  /lib/ld-2.4.so
bff1e000     84K rw---    [ stack ]
bfffe000      4K r-x--    [ anon ]
total     3596K


The only I can imagine is listing a directory which has 10-50 million files on the first level. That may cause OOM on a 1 GB RAM box. But people usually don't have maximum more than 1-2 million files in a single directory. Typically not even entire volumes have so many files.


Tue Apr 28, 2009 18:42
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
Under most circumstances, that's exactly what I see as well. Are you testing the same setup as I am?

ie

ntfs-3g /dev/sda1 /mnt-boot
mount -t squashfs -o ro,loop /mnt-boot/some_squash.img /mnt-system
mount -t ext3 -o rw,loop /mnt-boot/some_ext3.img /mnt-user
mount -t aufs -o br:/mnt-user:/mnt-system user /mnt

cheers,

Kris


Tue Apr 28, 2009 18:56
Profile
Tuxera CTO

Joined: Tue Nov 21, 2006 23:15
Posts: 1645
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
What's the output of your pmap?


Tue Apr 28, 2009 18:58
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
Here's the output of 'while true ; do date ; pmap `pgrep ntfs-3g` ; echo ; sleep 1 ; done' when unpacking a tar file:

Tue Apr 28 13:14:58 UTC 2009
955: ntfs-3g -o syncio /dev/sda1 /mnt-boot
08048000 804K r-x-- /bin/ntfs-3g (deleted)
08111000 12K rw--- /bin/ntfs-3g (deleted)
08114000 79016K rw--- [ anon ]
b7ed4000 268K rw--- [ anon ]
b7f17000 4K r-x-- [ anon ]
bfc03000 84K rw--- [ stack ]
total 80188K

Tue Apr 28 13:14:59 UTC 2009
955: ntfs-3g -o syncio /dev/sda1 /mnt-boot
08048000 804K r-x-- /bin/ntfs-3g (deleted)
08111000 12K rw--- /bin/ntfs-3g (deleted)
08114000 93272K rw--- [ anon ]
b7ed4000 268K rw--- [ anon ]
b7f17000 4K r-x-- [ anon ]
bfc03000 84K rw--- [ stack ]
total 94444K

Tue Apr 28 13:15:01 UTC 2009
955: ntfs-3g -o syncio /dev/sda1 /mnt-boot
08048000 804K r-x-- /bin/ntfs-3g (deleted)
08111000 12K rw--- /bin/ntfs-3g (deleted)
08114000 106604K rw--- [ anon ]
b7ed4000 268K rw--- [ anon ]
b7f17000 4K r-x-- [ anon ]
bfc03000 84K rw--- [ stack ]
total 107776K

Tue Apr 28 13:15:02 UTC 2009
955: ntfs-3g -o syncio /dev/sda1 /mnt-boot
08048000 804K r-x-- /bin/ntfs-3g (deleted)
08111000 12K rw--- /bin/ntfs-3g (deleted)
08114000 113012K rw--- [ anon ]
b7ed4000 268K rw--- [ anon ]
b7f17000 4K r-x-- [ anon ]
bfc03000 84K rw--- [ stack ]
total 114184K

Tue Apr 28 13:15:03 UTC 2009
955: ntfs-3g -o syncio /dev/sda1 /mnt-boot
08048000 804K r-x-- /bin/ntfs-3g (deleted)
08111000 12K rw--- /bin/ntfs-3g (deleted)
08114000 115240K rw--- [ anon ]
b7ed4000 268K rw--- [ anon ]
b7f17000 4K r-x-- [ anon ]
bfc03000 84K rw--- [ stack ]
total 116412K

This will keep growing and never release. The reason it shows up as deleted is that it's a static binary in the initramfs so by this point it's gone.

cheers,

Kris


Tue Apr 28, 2009 19:18
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
Interesting. I just went to reproduce this on a different system which has a bit older kernel among other things. It's the 'System Rescue CD' and it has older versions of just about everything, 2.6.27 kernel, etc. Did the same thing - ntfs partition, squash image, aufs, etc. and ntfs-3g performs perfectly. Now I'm going to try the same experiment on the latest ubuntu with newer tools. We may still be looking at some sort of regression but I can't rule out that maybe we're doing something dumb on our end.

cheers,

Kris


Tue Apr 28, 2009 19:35
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
This just keeps getting more mysterious all the time.

The test case given above of making the image files on a windows dir and doing all the mounts has failed to reproduce the problem.

I tried it on the system rescue cd (2.6.27), ubuntu 9.04 (2.6.28) AND tried it with presto (2.6.29) booted from a usb key and ntfs-3g did not have the leak in any of those cases.

I'm starting to wonder if this is peculiar to the situation where we've booted up and then caused the partitions to 'disappear' via the 'switch_root' in our init script. I'm going to boot to the one installed on the hard drive and try to reproduce it with my test case.

cheers,

Kris


Tue Apr 28, 2009 20:19
Profile

Joined: Mon Apr 27, 2009 21:47
Posts: 10
Post Re: BUG: massive memory leak with AUFS/SquashFS loopback
Here's an interesting bit of info.

I was about to do another test and went to copy my squash image in the windows partition and got some sort of error about creating a file. So I decided to boot into windows and see what was going on. Hmm...corrupted filesystem. The directory I had been working with was unreadable as far as windows was concerned.

Rebooted and did a chkdsk and it recovered all the files in that directory just fine. Went back into Presto and lo and behold, now the problem doesn't happen anymore.

So what's really interesting is:

a) with all the booting into different kernels, different ntfs-3g version, various tests, etc, everything seemed to keep working mostly fine from the Linux point of view.
b) the massive memory growth seems to be a result of ntfs-3g dealing with whatever corruption existed in the ntfs partition
c) the real problem now becomes: how did the filesystem get into this state in the first place and how can I prevent it from happening again? We've got customer reports of similar problems in the field so I have to assume it wasn't unique to my machine.

I'm going to have to look at our shutdown/boot sequence. Perhaps there is something being done which is not clean.

cheers,

Kris


Tue Apr 28, 2009 20:36
Profile
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Original forum style by Vjacheslav Trushkin.