Nsd-notify retries?

Michael_Tokarev · November 28, 2011, 2:55pm

Hello.

We had a network outage today, which caused quite some grief for us,
and one of the reasons was NSD, especially `nsdc notify' and nsd-notify.

We've about 30 zones defined locally, for which there are 2 local and 2
remote servers. We had to modify some data in these zones (actually
just refreshing dnssec keys because I forgot to do that earlier, but
it does not matter). After modification, I ran `nsdc notify', which
did multiple attempts to notify the two unreachable nameservers with
increasing timeouts (even if the network layer immediately returned
"no route to host"). As a result, I failed to fix our dns promptly,
since I waited for quite some time till it gets to the zones which
are really important -- so I interrupted it, copied the config,
removed all references to the unreachable nameservers from it,
re-ran the notify part, and restored the config.

Now, the questions.

Should maybe nsd-notify implement the functionality of the
nsdc script in this case, by scanning the conffile and sending
all notifies to all found zones and to all nameservers just the
same way as `nsdc notify' does, but doing it all in parallel, not
one after another?

And, should nsd-notify wait for so long and try to do so many
attempts for each? Maybe do just two attempts (second within
a 1-second interval) and be done with it? Or maybe there should
be some option for that?

Or maybe it is better for nsd itself to send the notifies, f.e.
as triggered by nsd-notify - so that nsd-notify does not send
notifies itself but sends a trigger to a running daemon who
maintains list of "pending" notifications? (Probably too
complicated for the daemon)

Why nsd-notify does not detect ICMP errors which are being
returned by the operating system, and waits till timeout
expires?

Right now I "fixed" this issue by adding an ampersand (&)
to the end of nsd-notify commandline in nsdc, and added one
`wait' call at the very end - this is not really portable,
but at least this way it works, unlike originally where it
will take ages to complete. Obviously this is also wrong
if the number of zones will be large - too many processes
may be spawned. But the "right" behavour can't be coded
in shell easily, standard /bin/sh does not have controls
for that - hence I asked if maybe nsd-notify itself should
parse the conffile and doing it all in parallel...

BTW, this is nsd version 3.2.8, I haven't seen 3.2.9 yet.

Thanks!

/mjt

PaulWouters · November 28, 2011, 3:15pm

I agree, and have brought this up in the past. I think it has
not been considered a high priority item because the focus of
nsd has been more on small sets of zones like TLDs. When you run
100 zones with nsd and you have a name server outage, all the
notify delays cause significant problems. Or in our case, we always
have some half broken test zones and test servers that are not
working causing massive delays in the init scripts.

I think the nsd team also feels the separate nsd-notify is an
obsolete feature, but I'm not sure if just restarting the daemon
itself causes the built-in notify code to trigger.

I would be happy if nsd-notify provided a "fire and forget" option,
even willing to write the patch

Paul

Wouter · November 28, 2011, 3:41pm

Hi Paul, Michael,

In NSD3, the daemon can perform notifies (with retries) for you, all in
parallel. This only happens when you have notify: configured for the
zone(s) and the serial number is updated (i.e. you nsdc rebuild && nsdc
reload, or it is a slave zone and the master is updated).

In NSD4, the same thing, but nsdc is obsolete, you have nsd-control
notify, nsd-control contacts the server over SSL and the daemon sends
notifies for one or all zones.

The daemon uses 50 sockets (or so) to do the updates, so 50 zones are
active at once, like 'make -j50 notify'. These are constants in xfrd.h
at this time, perhaps would need to be increased if you have 500000 zones.

Best regards,
Wouter

Michael_Tokarev · November 28, 2011, 3:58pm

28.11.2011 19:41, W.C.A. Wijngaards пишет:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Paul, Michael,

In NSD3, the daemon can perform notifies (with retries) for you, all in
parallel. This only happens when you have notify: configured for the
zone(s) and the serial number is updated (i.e. you nsdc rebuild&& nsdc
reload, or it is a slave zone and the master is updated).

Aha! So my old (i think from nsd2 days) script -- that did rebuild,
reload and notify -- is not obsolete too, it can be reduced to just
rebuild & reload. That's excellent to know, thank you!

(On an related note, I think I asked this question myself -- is there
a way to send a notify to _unbound_ daemon too?

In NSD4, the same thing, but nsdc is obsolete, you have nsd-control
notify, nsd-control contacts the server over SSL and the daemon sends
notifies for one or all zones.

The daemon uses 50 sockets (or so) to do the updates, so 50 zones are
active at once, like 'make -j50 notify'. These are constants in xfrd.h
at this time, perhaps would need to be increased if you have 500000 zones.

Yes, 50 sockets should be plenty even for largeish sites. Thank
you very much!

/mjt

Wouter · November 28, 2011, 4:08pm

Hi Michael,

28.11.2011 19:41, W.C.A. Wijngaards пишет:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Paul, Michael,

In NSD3, the daemon can perform notifies (with retries) for you, all in
parallel. This only happens when you have notify: configured for the
zone(s) and the serial number is updated (i.e. you nsdc rebuild&& nsdc
reload, or it is a slave zone and the master is updated).

Aha! So my old (i think from nsd2 days) script -- that did rebuild,
reload and notify -- is not obsolete too, it can be reduced to just
rebuild & reload. That's excellent to know, thank you!

(On an related note, I think I asked this question myself -- is there
a way to send a notify to _unbound_ daemon too?

That would be a DoS vector waiting to happen.

But you can get 'almost the same' with:
$ unbound-control flush_zone <nameofzone>
Since unbound-control works over SSL you could copy the keys over to a
directory on the zone-master server, and use unbound-control -c
config-of-unbound.conf flush_zone blabla. That would wipe the contents
of the zone from unbound's cache.

Best regards,
Wouter

anandb · November 28, 2011, 4:12pm

Hi Michael,

(On an related note, I think I asked this question myself -- is there
a way to send a notify to _unbound_ daemon too?

NOTIFY is generally meant to be used from a master to a slave, to let
the slave know it should refresh a zone.

Unbound is just a resolver, so NOTIFY doesn't quite do what you want.

My guess is that you want Unbound to flush a particular zone upon
receiving a NOTIFY message, so that it is forced to look up new records
in that zone. I think someone had a similar wish some time ago, and
hacked it into Unbound. It may be in the archives somewhere. This
feature is unlikely to make it into the stock Unbound though. It really
is a hack.

Regards,

Anand Buddhdev
RIPE NCC

PaulWouters · November 28, 2011, 4:17pm

In NSD3, the daemon can perform notifies (with retries) for you, all in
parallel. This only happens when you have notify: configured for the
zone(s) and the serial number is updated (i.e. you nsdc rebuild && nsdc
reload, or it is a slave zone and the master is updated).

But when adding a zone, you need a restart, not just rebuild & reload
What happens then?

In NSD4, the same thing, but nsdc is obsolete, you have nsd-control
notify, nsd-control contacts the server over SSL and the daemon sends
notifies for one or all zones.

Good, so on startup will it send notifies to all secondaries per default?
eg this could then be removed from the init scripts?

the 50 at a time is fine when it is the daemon doing it, meaning the server
is up and running. The issue with nsd3 is that you have to run nsd-notify
before the daemon launches, meaning you are down while waiting.

Paul

Wouter · November 29, 2011, 9:23am

Hi Paul,

In NSD3, the daemon can perform notifies (with retries) for you, all in
parallel. This only happens when you have notify: configured for the
zone(s) and the serial number is updated (i.e. you nsdc rebuild && nsdc
reload, or it is a slave zone and the master is updated).

But when adding a zone, you need a restart, not just rebuild & reload
What happens then?

It should send notifies for the added zone(s).

In NSD4, the same thing, but nsdc is obsolete, you have nsd-control
notify, nsd-control contacts the server over SSL and the daemon sends
notifies for one or all zones.

Good, so on startup will it send notifies to all secondaries per default?
eg this could then be removed from the init scripts?

It sends notifies for changed zone(s) per default.

If you want to send notifies for all zones (where it is not necessary
because they have not changed), you have to use the nsd-control command.

the 50 at a time is fine when it is the daemon doing it, meaning the server
is up and running. The issue with nsd3 is that you have to run nsd-notify
before the daemon launches, meaning you are down while waiting.

That would not be optimal. If you run nsd-notify while the daemon has
not launched yet, the slaves will immediately try to contact the master
to download the zone, but it has not started and it not available.
Instead, first start the daemon, then send notifies, so that the slaves
can download the zone immediately.

NSD4 also has nsd-control force_transfer <zone> that you can run on the
slave server and it forces a full AXFR, even if the SOA serial has not
changed.

NSD4 is under development, these features are implemented in svn trunk.
If you decide to try it: note nsdc and zonec gone, config and database
file format changes, nsd-control is useful. It is backwards compatible
with your old config file

Best regards,
Wouter

PaulWouters · November 29, 2011, 6:46pm

In NSD3, the daemon can perform notifies (with retries) for you, all in
parallel. This only happens when you have notify: configured for the
zone(s) and the serial number is updated (i.e. you nsdc rebuild && nsdc
reload, or it is a slave zone and the master is updated).

But when adding a zone, you need a restart, not just rebuild & reload
What happens then?

It should send notifies for the added zone(s).

How does it know what zones were "added"? I dont think that information
persists a daemon restart? And since adding a zone and rereading the conf
file requires a full restart for nsd3, I don't think it can "know" ?

the 50 at a time is fine when it is the daemon doing it, meaning the server
is up and running. The issue with nsd3 is that you have to run nsd-notify
before the daemon launches, meaning you are down while waiting.

That would not be optimal. If you run nsd-notify while the daemon has
not launched yet, the slaves will immediately try to contact the master
to download the zone, but it has not started and it not available.
Instead, first start the daemon, then send notifies, so that the slaves
can download the zone immediately.

when the daemon runs, nsd-notify cannot grab port 53 to send the notifies.

NSD4 also has nsd-control force_transfer <zone> that you can run on the
slave server and it forces a full AXFR, even if the SOA serial has not
changed.

NSD4 is under development, these features are implemented in svn trunk.
If you decide to try it: note nsdc and zonec gone, config and database
file format changes, nsd-control is useful. It is backwards compatible
with your old config file

I'm not ready yet for nsd4, though if you really want me to try it, the
feature that turns me into a beta tester is ixfr_from_diffs

Paul