unfortunately NSD 4.1.26 still does not work on Debian 10 Buster due to permission errors.
I have tested it on two fresh Debian 10 Buster installations and I still get this error messages:
error: Cannot open /var/log/nsd.log for appending (Read-only file system), logging to stderr
warning: failed to unlink pidfile /run/nsd/nsd.pid: Permission denied
error: could not open zone list /var/lib/nsd/zone.list: Permission denied
error: could not read zonelist file /var/lib/nsd/zone.list
Please find attached the configuration file I use (in this case for the master, slave is almost the same).
Subscribed specially to reply to the subject thread.
I am also trying to run nsd on debian buster, and it's not working so nicely.
error: Cannot open /var/log/nsd.log for appending (Read-only file system), logging to stderr
warning: failed to unlink pidfile /run/nsd/nsd.pid: Permission denied
I added "/var/log" and "/run/nsd" ReadWritePaths to the nsd.service file, but the error remains:
[Unit]
Description=Name Server Daemon
Documentation=man:nsd(8)
After=network.target
I read in Paul Wouters reply to add nsd User/Group to the service file, but then nsd no longer starts, as the nsd user has no permission to bind to port 53:
error: can't bind udp socket: Permission denied
I wanted to migrate from bind to nsd, but it seems the debian package could use some love.
Does anyone have a suggestion how to proceed..? (a working systemd file perhaps?)
unfortunately I couldn’t fix it. I tried one billion things, but nothing worked. So I needed to go the hard way and commented this out in /etc/systemd/system/multi-user.target.wants/nsd.service:
Try to add CAP_DAC_OVERRIDE to CapabilityBoundingSet so it ends up being:
CapabilityBoundingSet=CAP_CHOWN CAP_DAC_OVERRIDE CAP_IPC_LOCK CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_CHROOT
As you saw, you need to add "ReadWritePaths=/var/log/" to the systemd
unit so that nsd can create the file.
When you do so, on first startup, nsd changes UID from root -> nsd and
then creates /var/log/nsd.log:
root@d10-nsd:~# ls -l /var/log/nsd.log
-rw-r--r-- 1 nsd nsd 151 May 27 14:15 /var/log/nsd.log
On subsequent starts, nsd checks if it can append to the log while still
running as root. I believe this is a bug as this check should happen
after the switch from root->nsd. You can workaround it by using the big
hammer that is CAP_DAC_OVERRIDE [*] or add this with `systemctl edit nsd`:
This way, systemd will make the file root owned to please nsd that will
chown it right after starting.
As for the failed unlinking of the pidfile, this is harmless and should
not be logged as a warning. It may already be fixed in newer releases as
it was done with Unbound already.
HTH,
Simon
*: If you use the CAP_DAC_OVERRIDE way, you don't need to list all the
caps as they are additive. This alone would do:
All of this seems to be band-aid upon band-aid of unnecessary hacks.
As for the failed unlinking of the pidfile, this is harmless and should
not be logged as a warning. It may already be fixed in newer releases as
it was done with Unbound already.
PID files are so passé! They are irrelevant on systems where daemons are run under supervisors. I would highly recommend setting "pidfile" to "" in nsd.conf. This prevents creation of a PID file. Systemd already knows the PID of the NSD process, and can signal it directly.
As you saw, you need to add "ReadWritePaths=/var/log/" to the systemd
unit so that nsd can create the file.
When you do so, on first startup, nsd changes UID from root -> nsd and
then creates /var/log/nsd.log:
root@d10-nsd:~# ls -l /var/log/nsd.log
-rw-r--r-- 1 nsd nsd 151 May 27 14:15 /var/log/nsd.log
On subsequent starts, nsd checks if it can append to the log while still
running as root. I believe this is a bug as this check should happen
Are you certain of this? I have never seen any errors on my NSD systems.
I tried to fix the contrib nsd.service by adding Simon's suggestion to
it, if that is wrong let me know:
Also the unlink error message is fixed in the same manner as Unbound's
printout; by silencing it to avoid chatter due to permission errors. It
seems like NSD did manage to empty the file for MJ, but not unlink it.
All of this seems to be band-aid upon band-aid of unnecessary hacks.
That's a band-aid indeed. IMHO the proper fix is to be consistent in
handling the file. So either open it as root and not chown it or always
touch it after setuid().
As for the failed unlinking of the pidfile, this is harmless and should
not be logged as a warning. It may already be fixed in newer releases as
it was done with Unbound already.
PID files are so passé! They are irrelevant on systems where daemons are
run under supervisors. I would highly recommend setting "pidfile" to ""
in nsd.conf. This prevents creation of a PID file. Systemd already knows
the PID of the NSD process, and can signal it directly.
Would it make sense to simply ignore the pidfile directive when running
through systemd?
I think this should be fixed rather than worked around like that. See my
other email, please.
That said, I must admit that I never used that contrib/nsd.service file,
only Debian's. The contrib one seems to be a mangled copy of Debian's
[*] because it has the same typo I fixed in the SystemCallFilter mount
rule (s/mount/@mount/).
Ideally, this contrib file should become the canonical reference used by
downstream distro providers. I would certainly welcome a switch to using
User=nsd as suggested by Paul Wouter but that requires other distros to
buy in.
Are you certain of this? I have never seen any errors on my NSD systems.
I reproduced it all in a Debian Buster VM before posting. Are you using
the same systemd unit as Debian Buster's [*] ?
No, I'm running NSD on CentOS 7. I'm not using the unit file from contrib. I find it a mess. It's trying to enable every possible option in systemd, without taking care of all the related permission problems they cause. I build my own packages of NSD, and ship a very simple unit file with it:
[Unit]
Description=NSD DNS Server
After=network-online.target
All of this seems to be band-aid upon band-aid of unnecessary hacks.
That's a band-aid indeed. IMHO the proper fix is to be consistent in
handling the file. So either open it as root and not chown it or always
touch it after setuid().
I agree. In order to avoid problems, on my systems, I log to /var/log/nsd, where that directory is owned by nsd:nsd.
As for the failed unlinking of the pidfile, this is harmless and should
not be logged as a warning. It may already be fixed in newer releases as
it was done with Unbound already.
PID files are so passé! They are irrelevant on systems where daemons are
run under supervisors. I would highly recommend setting "pidfile" to ""
in nsd.conf. This prevents creation of a PID file. Systemd already knows
the PID of the NSD process, and can signal it directly.
Would it make sense to simply ignore the pidfile directive when running
through systemd?
No. I don't like it when software silently does things. Instead, when package maintainers build NSD for systems with systemd, they should pass the --with-pidfile="" option to the configure script, so that by default, NSD doesn't create PID files. If a user still wants a PID file for some bizarre reason, he can set the "pidfile" option in nsd.conf. And then deal with the permissions issues himself.
I like the idea. Since Debian wants to preserve compatibility with both
systemd and init, I proposed a slightly different fix to Debian for nsd
[1] and unbound [2]. Thanks!
I have a suggestion. Maybe just delete this nsd.service file. To be honest, it's not very useful because it has a random mix of directives that don't help, or actually interfere with running NSD properly. As an example, it has this directive:
RestrictAddressFamilies=AF_INET AF_UNIX
But what about AF_INET6 then? The above will prevent NSD from being able to bind to an IPv6 socket.
I don't know where this file came from, but it's not good. If it's in there, people will use it. If you really want to provide a systemd unit file, then provide a minimal one that will work on most systems. A packager for a particular distro can add things to it if he likes. Additionally, if a user wants to tighten things up, they can always create an overlay for this unit file on their systems. Adding to a systemd unit is easier than removing existing directives in the base unit file.
I have a suggestion. Maybe just delete this nsd.service file. To be
honest, it's not very useful because it has a random mix of directives
that don't help, or actually interfere with running NSD properly. As an
Thank you for the suggestion. Removed it. Complicated and not useful
is not what I want for a contrib file, instead I would want files in
contrib to be helpful and add to make use of NSD in different
environments easier.
Yes the removal of IPv6 also seems counterproductive to me.
I like the idea. Since Debian wants to preserve compatibility with
both systemd and init, I proposed a slightly different fix to
Debian for nsd [1] and unbound [2]. Thanks!
I also noticed one other deficiency in the Debian unit file. It's
missing "Killmode=process".
Indeed, the default is KillMode=control-group which SIGTERM everyone in
the cgroup, wait 90s by default and then SIGKILL what remains.
NSD starts with a main process, and that then spawns child processes
to handle queries. When you want to kill NSD cleanly, you send a
TERM singal to the main process, which takes care of killing its
children.
However, systemd by default will send a TERM singal to all the
processes. This causes a haphazard termination of NSD. With the
Killmode setting as above, systemd sends a TERM signal only to the
main process, and NSD handles its shutdown cleanly.
I only manage a small fleet of nsd servers so that's probably why I
never noticed any problem with cgroup-based killing. However, I did try
to simulate this:
kill -SIGTEM 13011 -> does absolutely nothing, the child ignores it
kill -SIGTEM 12972 or kill -SIGTEM 12990 triggers a clean shutdown:
nsd[12990]: warning: signal received, shutting down...
Also, sending SIGTERM to all 3 triggers an orderly shutdown.
The above seems to match what the code intends to do but take that with
a grain of salt as I can barely read C.
Anand, could you please provide some instructions on how to reproduce
the issue you are/were having with the cgroup-based killing as my test
scenario was likely too simplistic. Thanks
I also noticed one other deficiency in the Debian unit file. It's
missing "Killmode=process".
Indeed, the default is KillMode=control-group which SIGTERM everyone in
the cgroup, wait 90s by default and then SIGKILL what remains.
Correct.
[snip]
Anand, could you please provide some instructions on how to reproduce
the issue you are/were having with the cgroup-based killing as my test
scenario was likely too simplistic. Thanks
I don't have a reproducible scenario on hand, but on servers I manage, there are often up to 32 child processes, and the servers are busy answering thousnads of queries per second, and also often doing zone transfers.
I noticed that sometimes when I wanted to shut down NSD on such servers, there would be temporary files left over from incomplete zone transfers. There may also have been something else, but I can't remember it now. Anyway, I realised this was caused by system sending TERM to all processes at the same time. That's why I fixed it with "KillMode=process". Maybe you can try by increasing the server count to a higher value, then forcing some zone transfers and then terminating NSD.
I have a suggestion. Maybe just delete this nsd.service file.
Thank you for the suggestion. Removed it. Complicated and not useful
is not what I want for a contrib file, instead I would want files in
contrib to be helpful and add to make use of NSD in different
environments easier.
It is also completely different from the one used in fedora or epel/centos.
Yes the removal of IPv6 also seems counterproductive to me.
This all came in via 70346a384 by you as part of the --enable-systemd
patch. It also includes the "socket activation" stuff, eg
contrib/nsd.socket that also makes absolutely no sense for a DNS server
daemon that is expected to always run anyway. And their default is
to activate it via queries received on 127.0.0.1.
Note for fedora/centos, I do not compile with --enable-systemd because
of these reasons, even though I would like to enable the systemd
watchdog part that is part of that feature. Perhaps the socket
activation and software watchdog parts can be split into two different
configure options? (--enable-sd-notify and --enable-sd-socket ?)
You can apt install it and tune with `systemctl edit` if you want. If
some modifications would benefit the general public, please send bug
reports or merge requests do Debian.