Saturday, October 23, 2010

Ant DirectoryScanner memory problems with large directory trees

I encountered a very interesting build problem this week. Hudson running out of memory when trying to parse test reports. It wasn't actually even getting to the parsing part though. Thread dump looked innocent, Ant...hmm... so I took a heap dump and examined that a bit. Top memory use? Ant's DirectoryScanner. Ant's DirectoryScanner was running out of memory just trying to round up the test report XMLs. Even with a 1.3GB max heap, it was not sufficient to scan our workspace anymore. Peculiar.

From looking at the heap dump, DirectoryScanner holds on to unincluded directories. This doesn't seem right to me... I may dig further when I have time. Simple searches showed some archived mailing list discussions, but I don't think it's related. Ant 1.6.5, 1.7.0, 1.7.1, 1.8.1 all observed this behavior for I do not think it is a regression. A simple ant goal run on this workspace - pure ant, couldn't scan the dir until it had a 1400m heap, and to do so took it 96 minutes because it was still memory constrained.

The pattern being used was fairly innocent looking:

Except it isn't that innocent when the root workspace dir has 300k+ files, and 1,000,000+ directories in it, with some very long path names on top of it. As it turns out, that ** is expensive and should be used with great care. Even though it is only traversing our code projects, it is still a very large directory tree and it runs out of memory.

More important than the workspace dir count, which should be getting skipped with the start of the pattern, the dir count within /projects/ is still almost 300k dirs.

Maybe this is a newbie mistake with Ant...but I'm not so sure. There are times where ** may be necessary, and I have a newfound awareness of how expensive this can be (in time taken, obviously, but more importantly in memory - it's no good if it fails because it runs out of heap...)

For a smaller directory tree on my dev machine, I wrote a short ant target that would touch 1 file, found using an include pattern similar to the above. The ** version took 24 seconds to execute on an Intel x25-m 160GB (G2) SSD. the * version took 0 seconds. The time penalty of ** would be much larger on slower, traditional hard drives...

Bear in mind, for Hudson users - Findbugs, Warnings, and other plugins typically use Ant include patterns (and I am sure Ant's DirectoryScanner), and in those cases sometimes ** is unavoidable.

I need to fire up a Hudson workspace and ensure that it supports multiple includes, so users (me!) can avoid ** in more cases...