Hello,
We have this in our NSD logs occasionally:
[1214740996] nsd[93921]: warning: nsd is already running as 93888, continuing
[1214740996] nsd[93922]: error: can't bind the socket: Address already in use
[1214741027] nsd[94418]: error: can't bind the socket: Address already in use
[1214741057] nsd[94932]: error: can't bind the socket: Address already in use
I think this is because we have a script monitoring to make sure NSD is running at all time and attempts to start it... even though NSD is already running.
In the nsdc.sh script we see the following:
signal() {
if [ -s ${pidfile} ]
then
kill -"$1" `cat ${pidfile}` && return 0
else
echo "nsd is not running"
fi
return 1
}
But it seems like NSD restarts itself regularly, getting a new process ID when it does so. In this case, we have the possibility for the following to happen:
- nsdc.sh reads the contents of pidfile
- NSD restarts, getting a new PID
- nsdc.sh sends a signal to test NSD using the old PID, which fails, so nsdc claims NSD is not running
Is this possible?
It is possible to work around this with a little more sophistication, I think:
signal() {
while true
do
# if there is no PID file, NSD is not running
if [ ! -s ${pidfile} ]
then
return 1
fi
# if we can send the signal to the PID, then NSD is running
# (or some other process with that PID, but we hope not...)
PID=`cat ${pidfile}`
if kill -"$1" $PID
then
return 0
fi
# double-check NSD did not restart between the time we read the PID
# and the time we sent the signal
CHECK_PID=`cat ${pidfile}`
if [ $PID -eq $CHECK_PID ]
then
echo "nsd is not running"
return 1
fi
done
}