Larger memory footprint with direct-rdata-storage in NSD 4.14.0

Hello,

We recently updated from NSD 4.13.0 to 4.14.0.

Refactor RDATA storage to reduce memory footprint

https://github.com/NLnetLabs/nsd/pull/444

The amount of reduction depends on the zone contents.

We observed a reduction across our smaller zones and nameservers, however we also observed the opposite: the nameservers serving our largest zones now consume more memory on NSD 4.14.0. Is this to be expected in some cases?

The absolute figures here are trivial and are of no practical concern to us. We are not a registry, and so even our largest zones are tiny in comparison to some of your other users. :slight_smile: I nevertheless wondered whether this behaviour was expected. If unexpected, and if the adverse outcome is liable to scale on some very large zones, maybe this report would be of interest to you.

Unfortunately, I am unable to share our zone data for it includes confidential information that is not served on public networks, though I am happy to share what information I can. I will focus on one zone that was observed to consume more memory on NSD 4.14.0.

nsd_size_db_in_mem_bytes (size.db.mem) went from ~6.49 Mi on NSD 4.13.0 to ~8.80 Mi on NSD 4.14.0. There are about 9K RRs in the zone. Our zones are specifically designed to serve RFC 6763 DNS-SD data (and nothing else). We never load more than one zone into each NSD server. We build with --enable-packed and --disable-radix-tree; our complete set of configure flags can be found at the bottom of this message. The positive delta is reproducible on Linux and macOS.

Here is the aggregate composition of that zone by RR type:

2626 AAAA
1871 PTR
   1 SOA
2736 SRV
1871 TXT

Hi Saj,

Thanks for looking into your findings and reporting!

I have been testing the new RDATA storage approach with these zones: .lol, .nl, .se, .net, .org and .com, which had with your configure options (–enable-packed and --disable-radix-tree) compared to NSD 4.13.0 compiled with the same options, the following reductions:

  • .lol: 29.3%
  • .nl: 26.6%
  • .se: 23.7%
  • .net: 6.6%
  • .org: 6.5%
  • .com: 6.1%.

But none of these zones have PTR records, so I am afraid I missed this! I will look into this now and report back.

Thanks again for reporting. I was about to post a blog post about my testing results, but will postpone that until we’ve figured out what is going on and possibly how to remedy it. You saved me from posting a “all good news” story, which clearly needs to be nuanced.

Regards,

– Willem

Hi Saj,

Just a quick update on the issue. I can reproduce.

The issue is in our region-allocator. Memory recycling (which is happening when resizing RRsets when adding new RRs to them) is not performing too well when there are a relatively few, but uniquely large RRsets, as in RRsets with many RRs. NSD before 4.14.0 already didn’t do too well under these circumstances, but our new release can do worse. So it is the combination of RRsets with a uniquely large number of RRs; that is uniquely large among all the zones served in a NSD instance.

I do think I can remedy this, but there is no quick and easy fix.

I’ll keep you posted.

– Willem

Hi Saj,

Another status update ;-).

I worked on a remedy minimizing memory fragmentation for zones with RRsets with many RRs (like DNS-SD zones). You can find it on this branch: https://github.com/NLnetLabs/nsd/tree/devel/reduce-fragmentation-with-many-RR-RRsets , for which I made this PR: https://github.com/NLnetLabs/nsd/pull/472

Do you think you can find the type to try it out? With me, and my worse-case test zone, the result was rather spectacular.

Cheers,

– Willem

I worked on a remedy minimizing memory fragmentation for zones with RRsets with many RRs (like DNS-SD zones).

Thank you, Willem, for picking this up so quickly.

With me, and my worse-case test zone, the result was rather spectacular.

Here are some numbers from the zone that first drew our attention.

NSD_4_13_0_REL size.db.mem=6717192
NSD_4_14_0_REL size.db.mem=9116758
NSD_4_14_0_RC1-20-g39775308 size.db.mem=1155874

This is indeed quite spectacular. :slight_smile:

Your test suites are surely much better than ours, so I am yet to test for nameserver function. Let me know if you need me to put a Linux build into a traffic path. We are in change freeze for the upcoming holiday period, but I might be able to arrange a test in a pre-production environment.

Thanks again!