A NSD 3.2.16, secondary for a large zone, suddenly stopped updating
the zone:
xfrd: zone foobar.example: soa serial 2222549638 update failed (acquired: 1406646354), restarting transfer (notified zone)
Other (smaller) zones on the same machine are properly updated.
Checking the source code (xfrd.c) does not give me many
hints. "acquired" is the time where the SOA was acquired but it it not
clear what is wrong exactly, and how to solve it. Did anyone see such
behavior?
A NSD 3.2.16, secondary for a large zone, suddenly stopped updating
the zone:
xfrd: zone foobar.example: soa serial 2222549638 update failed (acquired: 1406646354), restarting transfer (notified zone)
Other (smaller) zones on the same machine are properly updated.
Checking the source code (xfrd.c) does not give me many
hints. "acquired" is the time where the SOA was acquired but it it not
clear what is wrong exactly, and how to solve it. Did anyone see such
behavior?
We see the same behavior (also with nsd 3.2.16). It run stable until we started to make a lot of updates to the zone. 1 IXFR out of 4 fails and the automatic restart of the transfer won't also work. It needs an extra notify to trigger the transfer again (and then it works).
Did you discover anything? As you mentioned the source code does not give many hints.
We see the same behavior (also with nsd 3.2.16). It run stable until we
started to make a lot of updates to the zone. 1 IXFR out of 4 fails and
the automatic restart of the transfer won't also work. It needs an extra
notify to trigger the transfer again (and then it works).
I did some debugging and found some strange behavoirs. The bug is triggered by a strange race condition including a big zone transfer, a slow connection (results in a long running transfer) and the nsd-patch job.
I can trigger this error if I run the nsd-patch job during an active tranfers but not all the time. There is a small time window when the nsd-patch job kills the reloading of the zone.
Is there a recommendation how often the nsd-patch job should run? What happens if the jobs runs during an active IXFR? I noticed that the ixfr.db gets merged into the nsd.db but the transfer is still running and starts a new ixfr.db. Is the nsd.db now in an inconsitent state (with an incompleted zonetransfer applied)? Wouldn't it be better to trigger the patch-job after a successful transfer rather then time based?
Is there a recommendation how often the nsd-patch job should run? What
happens if the jobs runs during an active IXFR? I noticed that the
ixfr.db gets merged into the nsd.db but the transfer is still running
and starts a new ixfr.db. Is the nsd.db now in an inconsitent state
(with an incompleted zonetransfer applied)? Wouldn't it be better to
trigger the patch-job after a successful transfer rather then time based?
It is indeed quite possible for nsd-patch to interfere with an incoming
IXFR or AXFR. I have never been a fan of this model.
In NSD 4, things work quite differently. The daemon itself manages the
nsd.db file, so there's no nsd-patch needed. This allows the daemon to
keep the file consistent, as there is no external interference.
Furthermore, there is even a mode to tell NSD not to write a single
nsd.db, but to save zones in plain text. This has the added benefit of
using less RAM.
I have put NSD 4 through a lot of testing myself, and the NLNetLabs guys
(especially Wouter) were fantastic in working with me to iron out
various bugs and issues. We're using NSD 4 in production on some RIPE
NCC servers, and I am very happy with it. I suggest that if you are able
to, switch to NSD 4.
I have put NSD 4 through a lot of testing myself, and the NLNetLabs guys
(especially Wouter) were fantastic in working with me to iron out
various bugs and issues. We're using NSD 4 in production on some RIPE
NCC servers, and I am very happy with it. I suggest that if you are able
to, switch to NSD 4.
Do you also have tested NSD 4.1 without database?
Not only tested, but running in production. This "nodb" mode was added
by Wouter at my request, mainly for lower memory usage. On my servers,
NSD's RAM consumption went from 25GB down to 17GB with this change.
There is a very small bug in 4.1 when using this "nodb" mode, but it
doesn't affection DNS service. The bug is such that if a slave zone
expires on NSD, and you then restart NSD, the zone gets loaded from disk
and is marked "fresh" despite its expired status in the xfrd.state file.
The fix is already in trunk, and will be in 4.1.1. Be sure to merge in
the patch from trunk for now, if you decide to use 4.1 with the "nodb" mode.