Nsd process restarting

Franky_Van_Liedekerk · September 14, 2022, 1:02pm

Hi,

I’m using nsd 4.6.0 on ubuntu 20, latest patches installed (compiled from source). I see that with the option ‘server-count=2’ the 2 “server” processes reguraly restart (about every second), the vm has 2 cpu’s btw. When I decrease it to server-count=1, the “server” process (not the main, not the xfr) still restarts but less frequently.
The server is being used as an authorative-only server and nothing is found in /var/log/nsd/nsd.log.
NSD is running under systemd, with the option “–enable-systemd” set during compile time (and “Type=notify” in the systemd service file)

Is it normal to see this process restarting that often? Can I configure a setting to find something in the logs concerning this?

With friendly regards,
Franky

anandb · September 15, 2022, 12:08pm

Hi Franky,

Are any of the zones served by this instance configured as secondaries, and receiving frequent updates via XFR? If so, then it's normal to see the server processes restart.

Regards,
Anand

Franky_Van_Liedekerk · September 15, 2022, 1:28pm

Hi Franky,

Are any of the zones served by this instance configured as secondaries,

and receiving frequent updates via XFR? If so, then it’s normal to see

the server processes restart.

Regards,

Anand

So the process restarts to serve freshly updated zones? Is there any reason to that logic? Because it is indeed a server that serves more than 7000 zones with the real masters indeed updating their zones regularly, and restarting a dns process because a zone was updated is a very costly step …

Franky

anandb · September 15, 2022, 1:48pm

Hi Franky,

So the process restarts to serve freshly updated zones? Is there any

Yes, kind of. The master creates new child processes, which have the old zone data in them. These child processes apply the updates to their in-memory copy of the zones, and then take over serving zones from the previous processes.

reason to that logic? Because it is indeed a server that serves more
than 7000 zones with the real masters indeed updating their zones
regularly, and restarting a dns process because a zone was updated is a
very costly step ...

Well, there are different ways to apply updates to existing zones, and the NSD developers chose this one (fork new child processes to apply the update). Other name servers like BIND and Knot DNS do things differently.

I agree with you that it's not the most efficient way to do things. For starters, NSD temporarily causes memory usage to double when applying zone updates. So you either have to provision a server with double the amount of RAM, or play tricks with swap and/or tune the kernel's memory variables (eg. overcommit) to ensure that fork() doesn't fail.

However, this is how NSD works, so if you're using it, then it's good to understand it, and tune your server accordingly.

If you feel that the reloads are too frequent, you can slow them down by adjusting "xfrd-reload-timeout" from the default of 1s to 10s or even higher. This causes the server processes to be restarted less frequently. The consequence is that more updates are batched together, and zone updates will not be visible immediately.

Regards,
Anand

Jeroen_Koekkoek1 · September 16, 2022, 11:32am

Hi Anand, Franky,

Just a quick remark on Anand's excelent explanation.

It's not the child processes that apply updates. Rather, it's the main
server that gets forked, applies the updates and then starts new
children. If the reload succeeds, the old server and it's children are
killed.

The reason for this is that it gives us completely lock free database
access across multiple processes and that greatly improves performance.
This builds on the fork behavior found on UNIX platforms, which shares
the memory between every forked process in a copy-on-write manner. On
unices, forking is relatively cheap.

Obviously, if you have lots of zone updates and there are lots of
restarts, that becomes increasingly expensive. Allowing for a little
more time between reloads solves that, like Anand stated.

Best,
Jeroen

Franky_Van_Liedekerk · September 16, 2022, 1:43pm

Hi,

we’ve configured a timeout of 15 seconds for xfr. I did see a load improvement (of course the server never was that loaded to begin with, but it is calming down and all CPU cycles count )

Franky