I've noticed that when NSD shuts down, it can be a bit messy. Here's an
example:
[1390913907] nsd[22966]: warning: signal received, shutting down...
[1390913907] nsd[22966]: warning: rmdir /var/lib/nsd/nsd-xfr-1093
failed: Directory not empty
[1390913908] nsd[1093]: warning: rmdir /var/lib/nsd/nsd-xfr-1093 failed:
Directory not empty
[1390913932] nsd[23003]: info: zone 96.147.in-addr.arpa. received update
to serial 2014012801 at 2014-01-28T12:58:23 from 193.0.0.198 TSIG
verified with key ripencc-20110222 of 1026966 bytes in 0.202617 seconds
Notice that processes 22966 and 1093 (1093 is the main process) both
tried to delete a directory which contained an incoming zone transfer,
and failed. This incoming zone transfer was being processed by process
23003.
When I killed 1093, it exited almost immediately, and did not wait for
any of its children. In particular 23003 continued running for 24 more
seconds, and only exited after applying the zone transfer to the database.
This is problematic, because when I killed 1093, and it exited, as far
as I was concerned, NSD was completely shut down, and I could restart
it, or do other housekeeping. But one NSD process was still running.
Additionally, that last process, after applying its zone transfer, did
not remove the directory /var/lib/nsd/nsd-xfr-1093, so that directory is
now hanging around.
Is there any reason the master process does not wait for its children to
finish before exiting?
I've noticed that when NSD shuts down, it can be a bit messy.
Here's an example:
[1390913907] nsd[22966]: warning: signal received, shutting
down... [1390913907] nsd[22966]: warning: rmdir
/var/lib/nsd/nsd-xfr-1093 failed: Directory not empty [1390913908]
nsd[1093]: warning: rmdir /var/lib/nsd/nsd-xfr-1093 failed:
Directory not empty [1390913932] nsd[23003]: info: zone
96.147.in-addr.arpa. received update to serial 2014012801 at
2014-01-28T12:58:23 from 193.0.0.198 TSIG verified with key
ripencc-20110222 of 1026966 bytes in 0.202617 seconds
Notice that processes 22966 and 1093 (1093 is the main process)
both tried to delete a directory which contained an incoming zone
transfer, and failed. This incoming zone transfer was being
processed by process 23003.
When I killed 1093, it exited almost immediately, and did not wait
for any of its children. In particular 23003 continued running for
24 more seconds, and only exited after applying the zone transfer
to the database.
This is problematic, because when I killed 1093, and it exited, as
far as I was concerned, NSD was completely shut down, and I could
restart it, or do other housekeeping. But one NSD process was still
running. Additionally, that last process, after applying its zone
transfer, did not remove the directory /var/lib/nsd/nsd-xfr-1093,
so that directory is now hanging around.
Is there any reason the master process does not wait for its
children to finish before exiting?
We did update it in NSD4 for precisely this issue, but then for the
'server' processes. The process that has the issue now is the reload
process. I'll have to take a good look at how I can make that one
stop earlier (without trashing the nsd.db file). Likely, you'll have
to wait for 10-20 seconds in your case before it manages to kill off
that long running reload process.
This reload process was handling a zone transfer in that directory,
which is why it was not empty. Thank you for that report, it explains
earlier observations of the directory not empty.
We did update it in NSD4 for precisely this issue, but then for the
'server' processes. The process that has the issue now is the reload
process. I'll have to take a good look at how I can make that one
stop earlier (without trashing the nsd.db file). Likely, you'll have
to wait for 10-20 seconds in your case before it manages to kill off
that long running reload process.
Waiting a while for NSD to finish shutting down is not a problem. For
example, when I shut down BIND, it usually takes 15-20 seconds for the
process to exit while it flushes journals and zones to disk. Similarly,
when I shut down Knot, it takes it several seconds to do its
housekeeping and free up allocated memory. So when my shutdown call
returns, I can be certain that the software has been fully shut down,
and there are no surprises.
This reload process was handling a zone transfer in that directory,
which is why it was not empty. Thank you for that report, it explains
earlier observations of the directory not empty.