while running nsd as a secondary nameserver with +1000
domains we discovered that the default nsdc(8) was
not able to reliable restart nsd.
Reason I think is that, by using the PID file, it sends
it's signal to only 1 of the default 3 processes.
Afterwards it only checks against this 1 process while
the other 2 still may be running causing trouble on
start up.
The patch below fixes it for us (was tested in a lab
environment with 10.000 domains).
Alf
--- usr.sbin/nsd/nsdc.sh.in.orig Fri Aug 10 09:37:33 2012
+++ usr.sbin/nsd/nsdc.sh.in Fri Aug 10 09:34:56 2012
@@ -188,18 +188,18 @@
try=1
while [ $try -ne 0 ]; do
- if [ ${try} -gt 50 ]; then
+ if [ ${try} -gt 60 ]; then
echo "nsdc stop failed"
return 1
else
if [ $try -eq 1 ]; then
kill -TERM ${pid}
else
- kill -TERM ${pid} >/dev/null 2>&1
+ pkill -TERM nsd >/dev/null 2>&1
fi
while running nsd as a secondary nameserver with +1000
domains we discovered that the default nsdc(8) was
not able to reliable restart nsd.
Reason I think is that, by using the PID file, it sends
it's signal to only 1 of the default 3 processes.
Afterwards it only checks against this 1 process while
the other 2 still may be running causing trouble on
start up.
The patch below fixes it for us (was tested in a lab
environment with 10.000 domains).
The "pkill" command is not available on all systems. Linux distros ship
with it these days, and MacOS X introduced it with Mountain Lion (10.8),
but it may not be available on other systems. Therefore your patch is
not portable.
Aha! I have run into this as well, especially in combination
with opendnssec. I had filed a bug report, but there were issues
reproducing it. I'm glad I'm not crazy!
while running nsd as a secondary nameserver with +1000
domains we discovered that the default nsdc(8) was
not able to reliable restart nsd.
Reason I think is that, by using the PID file, it sends
it's signal to only 1 of the default 3 processes.
Afterwards it only checks against this 1 process while
the other 2 still may be running causing trouble on
start up.
I wondered whether there's a particular reason that only the
master is signalled, or is this purely due to lack of a portable
pkill-type program?
The patch below fixes it for us (was tested in a lab
environment with 10.000 domains).
The "pkill" command is not available on all systems. Linux distros ship
with it these days, and MacOS X introduced it with Mountain Lion (10.8),
but it may not be available on other systems. Therefore your patch is
not portable.
Some OS have "killall" that does the same as pkill, but other
OS have a different "killall" that behaves slightly differently
[root@nohats ~]# pidof nsd
4697 4696 4677
[root@nohats ~]# ls /var/run/nsd
[root@nohats ~]# nsdc stop
nsd is not running
somehow nsd gets signaled and deletes its pid, but won't write a new
one. There are two methods my nsd is getting signalled. One is via
an hourly cron running (if necc) a nsdc patch and nsdc reload. When
doing this manually, it works fine and the reload signals nsd and a
new pidfile is created:
So it all looks fine, but after a while something happens and the
pidfile is either wrong or gone, and then all of these fail. But
even with the pkill patch applied to /usr/sbin/nsdc, this still
happens.
The patch is far from ideal (what would happen if you have more then 1
nsd running?). However we use this in production for roughly a year
and it survived 300+ restarts.
Since the secondary domains only exist in memory I don't see any harm
in killing all instances of nsd at once, pkill them without even
looking at the pid-file should be fine too.
We run OpenBSD everywhere so I sent this to sthen@openbsd who then suggested
to post it here to get more feedback, that's why it's not portable:P
We run the default nsdc on our primary servers where the problem
does not exist.