I have an instance of nsd running with 50k domains (each with only 6 records) and I get a less continuous output to the log file of:
[1324495953] nsd[14256]: error: malloc failed: Cannot allocate memory
[1324495954] nsd[22167]: warning: xfrd process 14256 failed with status 256, restarting
This is even though there is ~3G free RAM on that box.
This nsd instance is never going to either receive an xfr request nor send notify to another instance. It is strictly the single authoritative server for these domains (they are for e-mail testing, if it matters). Is there any way to disable the xfrd functionality completely? Or do I have to go through and patch the source code to not spawn that child process?
TIA
John
Hello John,
our NSD servers run about 100k zones and we had a number of problems with xfrd too -- terribly high cpu usage during reloads, waste of memory, etc. I sent few reports and patches about these problems in 2010, see e.g. http://open.nlnetlabs.nl/pipermail/nsd-users/2010-February/001071.html, http://open.nlnetlabs.nl/pipermail/nsd-users/2010-March/001078.html. But finally I realized that xfrd code cannot be easily adapted for such a large number of zones and decided to disable it at all. Because most of our DNS operations involve adding or removing a zone, we created our own synchronization that simply distributes compiled nsd.db among nameservers and so xfrd is useless for us.
Attached I send you our patch to disable xfrd in nsd-3.2.9. It's a dirty single-purpose hack but our nameservers run with it nearly 2 years.
I'd like to see an official option to disable xfrd functionality too, but I'm too busy to create a patch that would be suitable for upstream.
You can also try --enable-mmap configure option that replaces malloc-based allocator with mmap-based allocator for zone data. It's primary goal is to address excessive memory usage of xfrd, see my post in the list archive. It's marked as experimental but we have it enabled since March 2010 without issues.
Martin
Dne 24.1.2012 17:46, John Peacock napsal(a):
(attachments)
50_disable-xfrd-nsd-3.2.9.patch (3.27 KB)
Thank you very much! I did a much more simplistic patch (basically commenting out restarting the xfrd child process after the first failure) and resperf reports 42k queries/sec (as opposed to 2k with all of the xfrd thrashing). I tried your patch now, with the mmap flag as well, and now I am getting around 75k queries/sec!
Now if I can figure out why unbound is crashing due to malloc errors (which may be too aggressive tuning on my part), but that is a different list... 
John