NSD 4.3.9 stops updating zones

Hi NSD developers and users,

I've observed a situation with NSD that I think deserves some attention, and perhaps some kind of fix.

We have a server with 32GB of RAM. When we start NSD, it loads all the zones, and happily serves them. It uses close to 15GB of RAM. After a while, it gets a NOTIFY for a zone, and AXFRs the zone. It saves the XFR in /var/lib/nsd/nsd-xfr-5231. It then tries to apply the update, and this is when it all goes wrong. NSD's method of updating is to fork itself, have the child reload the changed zone(s), and take over from the parent... except that it can't fork because of memory shortage. While forking, NSD temporarily uses double the amount of RAM.

The log shows this:

[2022-03-30 15:16:27.986] nsd[5299]: error: fork failed: Cannot allocate memory
[2022-03-30 15:16:28.355] nsd[45999]: error: handle_reload_cmd: reload closed cmd channel
[2022-03-30 15:16:28.355] nsd[45999]: warning: Reload process 5299 failed, continuing with old database
[2022-03-30 15:16:28.355] nsd[5231]: error: process 5299 exited with status 256
[2022-03-30 15:16:29.776] nsd[45999]: error: fork failed: Cannot allocate memory
[2022-03-30 15:16:30.149] nsd[46012]: error: handle_reload_cmd: reload closed cmd channel
[2022-03-30 15:16:30.149] nsd[46012]: warning: Reload process 45999 failed, continuing with old database
[2022-03-30 15:16:31.748] nsd[46013]: error: handle_reload_cmd: reload closed cmd channel
[2022-03-30 15:16:31.748] nsd[46013]: warning: Reload process 46012 failed, continuing with old database

After this, there are no more log entries about trying to reload the database.

PID 5231 is the xfrd process, and 5299 was the master that coordinates things. Now, the situation looks like this:

# systemctl status nsd
● nsd.service - NSD DNS Server
    Loaded: loaded (/usr/lib/systemd/system/nsd.service; enabled; vendor preset: disabled)
    Active: active (running) since Tue 2022-01-04 12:07:30 UTC; 2 months 28 days ago
  Main PID: 5231 (nsd: xfrd)
    CGroup: /system.slice/nsd.service
            ├─ 5231 /usr/sbin/nsd -d
            ├─46013 /usr/sbin/nsd -d
            ├─46016 /usr/sbin/nsd -d
            └─46024 /usr/sbin/nsd -d

So we have the state where the xfrd process is running, and keeps doing zone transfers, which slowly accumulate in /var/lib/nsd/nsd-xfr-5231. Eventually, this will fill up the disk. Additionally, we have child processes running and serving queries, but the zones are now outdated. But there is no master process to apply the transfers. Log file rotation is also broken, because when I run "nsd-control log_reopen", no new log file is created. This will also cause the log file to grow unbounded, until it fills up the disk. Essentially, NSD is crippled, and only a restart will get it out of this broken state.

The easiest way to prevent this is to add RAM to the server. But my opinion is that this is a waste of resources. It may also not be trivial to do so. It might be easier on a virtual server, but with a physical server, one needs to buy RAM, shut down the server and add the memory modules. In this area, I find NSD to be deficient. Other name servers handle their memory differently, and make incremental use of memory as zones are added.

A question for the developers is: is there any way to make NSD handle zone reloads more efficiently rather than doing this fork/reload?

Regards,
Anand

Hello Anand,

do you have the chance to try version 4.4.0?
The announcement [1] promise "Lower memory usage of the XFRD process by default."

Andreas

[1] https://nlnetlabs.nl/projects/nsd/download/#nsd-4-4-0

Hi Andreas,

do you have the chance to try version 4.4.0?
The announcement [1] promise "Lower memory usage of the XFRD process by default."

Yes, I have also tried 4.4.0, but it has made no difference. I am not surprised, because the xfrd process isn't the one using most of the memory. Most of the memory is used by the master nsd process that keeps the zone database in memory.

Anand

Hi Andand, Andreas,

Andand's observeration is correct in that at some point NSD may use
double the memory on reloads. In this case, one of it's strengths is
also one of it's weaknesses. The copy-on-write memory characteristics
of the fork call are used so there's no need for locking on read-access
to the in-memory database. Unfortunately, it also means that worst-case
twice the amount of memory is required if there are lots of changes.
Basically, there really are two versions of the data in memory at some
point.

There's two things that can be done to reduce the amount of memory
required. 1) Disable the on-disk database. i.e. 'database: ""' in the
configuration file. 2) (Re)compile NSD with --disable-radix-tree, as
the radix tree requires more memory to operate.

A tip from a colleague was to change the kernel overcommit value. I
have no experience with this, so you/we'd have to test this (happy to
look into this further). Allowing the kernel to always overcommit
allows the fork to succeed even though double the memory is not
available. The rationale here being that when a zone is updated, it's
not the entire size of the RAM, only some pages get overwritten and
therefore could do the trick.

All of that being said, the aforementioned semantics are fundamental to
the the design. That is not to say these will never be changed, but
there's no "quick-fix" in terms of a configuration option or oneline
code change.

I'm thinking the other, perhaps bigger, problem here is the stray
transfers that are left behind(?) There's some work being done on
slightly altering the behavior for zone transfers in porting zone
verification (CreDNS) to NSD. That should already improve the situation
quite a bit as only a certain window is being retried instead of all
current transfers before proceeding. Maybe, after that's in, we could
look into adding an option to throttle updates (not discussed with the
team).

I hope this provides some insight. Let me know if the suggestions work.
Also, it would be good to know how often this problem occurs.

Best regards,
Jeroen

Hi Jeroen,

There's two things that can be done to reduce the amount of memory
required. 1) Disable the on-disk database. i.e. 'database: ""' in the
configuration file. 2) (Re)compile NSD with --disable-radix-tree, as
the radix tree requires more memory to operate.

We are already running NSD without an on-disk database.

Wouter had suggestion in an old message that disabling the radix tree might give us about 10% gain. So this might help us briefly, until the steady growth of zone sizes triggers this issue again.

A tip from a colleague was to change the kernel overcommit value. I
have no experience with this, so you/we'd have to test this (happy to
look into this further). Allowing the kernel to always overcommit
allows the fork to succeed even though double the memory is not
available. The rationale here being that when a zone is updated, it's
not the entire size of the RAM, only some pages get overwritten and
therefore could do the trick.

This is an interesting approach, and I could even try adding swap to the server. We currently don't create any swap partitions on our servers.

I'm thinking the other, perhaps bigger, problem here is the stray
transfers that are left behind(?) There's some work being done on
slightly altering the behavior for zone transfers in porting zone
verification (CreDNS) to NSD. That should already improve the situation
quite a bit as only a certain window is being retried instead of all
current transfers before proceeding. Maybe, after that's in, we could
look into adding an option to throttle updates (not discussed with the
team).

Yes, this issue is still a problem. It's not just stray transfers. In our case, once the master nsd process died, and wasn't replaced, xfrd kept performing transfers, and saving them to disk. The log file also kept growing, and rotation didn't work. The disappearance of the master process wasn't noticed.

Regards,
Anand