Multi-master mode NSD

Hello,

I am setting up NSD on systems that I intend to replicate
to locations. Due to replication, the distinction between
a master and slave, and whether it is needed, becomes
interesting. It got me thinking that multi-master mode DNS
could be possible. And as far as I can see, NSD3 can, or
can almost, support this mode of operation. Has anyone
tried this before?

--> But why?

You might wonder why multi-master mode is a good idea. I
think it can be useful because it limits the dependency on
a single master (which effictively is a single point of
failure). Of course there should be a replicated version
of the signing infra as well.

Additional places where this might be interesting is when
a DNSSEC signer is added to the master mix; when it starts
exporting signed zones it could be picked up by name servers,
simply because the SOA serial number of the signed zone is
higher. IOW -- it aids simplicity of DNS infrastructure
(but it also looks at it in a different manner, which may
take some getting used to).

--> But how?

First, NSD does not configure master/slave distinctions
with a flag, but merely adds allow-notify and request-xfr as
a way to retrieve zone data, which is functionally the
requirement for a slave.

I was thinking that setting up the master's IP on all locations
would not hurt, as a notify to oneself (and a possible xfer for
any reason) would not actually update the zone, as the data would
be the same as it was. That's what you get when you are talking
to yourself -- you won't hear anything you didn't know yet.

If multiple master IPs are configured, the highest SOA serial will
determine the winning zone data. This also applies when making
changes locally. So if an _other_ site has been updated, this pulls
that data into NSD. Sending NOTIFY or otherwise being patient does
ensure that data is passed from _any_ host that was last edited to
all the other authoritative servers.

Of course you should only edit the latest & greatest zone file.
This should be ensured by running nsd update and nsd-patch prior
to doing the edits. And sanity checking your input would help
too, of course.

What I am left wondering is if SOA expiry would ruin this
scenario and retract the zone from publication. Slaves store
their expiration timers in xfrd.state and there is no problem
if that file is removed while NSD is down. To avoid having to
do that all the time, the setting "xfrdfile: /dev/null" might
be useful. But actually, retrieving a copy from oneself may
do the trick just as well --it looks like an update and it
will therefore restart timers-- and that is better because it
becomes a per-zone setting.

If this works or could be made to work, it could simplify the
setup and crash recovery for authoritative name servers, by
making them all the same. Even key management would mix in
beautifully, because it uses a symmetric scheme. And NSD
appears so capable of this scenario that I wonder if it was
designed with this approach in mind?

To summerise my questions:

Q: Should we expect trouble (such as deadlocks or files being
   overwritten while they are read) if NSD does a request-xfr
   to oneself?

Q: Upon NOTIFY, will NSD retrieve the latest zone from _all_
   configured request-xfr hosts? If not, I would argue that
   it is more consistent to try all sources and let the latest
   SOA serial number win.

Q: Is it correct that editing a zone is possible with a sequence
   nsd-update ; nsd-patch ; vi ; nsd-update?

Q: Upon NOTIFY or nsd-update, will NSD read the zone file, even
   if it has request-xfr setup for that zone? Clearly, I would
   argue that an nsd-update should always be processed.

Q: Would the redirection of xfrd.state to /dev/null work?

Q: Would an xfer from oneself keep xfrd.state happy and thus
   avoid zone expiry on secondaries (which are also masters)?

It's a bit wild, I know. But I don't think this is plain stupid.
Any opinions?

Thanks!
-Rick

Hi Rick,

Incomplete answers, below.

Hello,

I am setting up NSD on systems that I intend to replicate to
locations. Due to replication, the distinction between a master
and slave, and whether it is needed, becomes interesting. It got
me thinking that multi-master mode DNS could be possible. And as
far as I can see, NSD3 can, or can almost, support this mode of
operation. Has anyone tried this before?

--> But why?

You might wonder why multi-master mode is a good idea. I think it
can be useful because it limits the dependency on a single master
(which effictively is a single point of failure). Of course there
should be a replicated version of the signing infra as well.

Additional places where this might be interesting is when a DNSSEC
signer is added to the master mix; when it starts exporting signed
zones it could be picked up by name servers, simply because the SOA
serial number of the signed zone is higher. IOW -- it aids
simplicity of DNS infrastructure (but it also looks at it in a
different manner, which may take some getting used to).

--> But how?

First, NSD does not configure master/slave distinctions with a
flag, but merely adds allow-notify and request-xfr as a way to
retrieve zone data, which is functionally the requirement for a
slave.

I was thinking that setting up the master's IP on all locations
would not hurt, as a notify to oneself (and a possible xfer for any
reason) would not actually update the zone, as the data would be
the same as it was. That's what you get when you are talking to
yourself -- you won't hear anything you didn't know yet.

If multiple master IPs are configured, the highest SOA serial will
determine the winning zone data. This also applies when making
changes locally. So if an _other_ site has been updated, this
pulls that data into NSD. Sending NOTIFY or otherwise being
patient does ensure that data is passed from _any_ host that was
last edited to all the other authoritative servers.

Of course you should only edit the latest & greatest zone file.
This should be ensured by running nsd update and nsd-patch prior to
doing the edits. And sanity checking your input would help too, of
course.

What I am left wondering is if SOA expiry would ruin this scenario
and retract the zone from publication. Slaves store their
expiration timers in xfrd.state and there is no problem if that
file is removed while NSD is down. To avoid having to do that all
the time, the setting "xfrdfile: /dev/null" might be useful. But
actually, retrieving a copy from oneself may do the trick just as
well --it looks like an update and it will therefore restart
timers-- and that is better because it becomes a per-zone setting.

If this works or could be made to work, it could simplify the setup
and crash recovery for authoritative name servers, by making them
all the same. Even key management would mix in beautifully,
because it uses a symmetric scheme. And NSD appears so capable of
this scenario that I wonder if it was designed with this approach
in mind?

Yes more complicated mesh of zone transfers is designed for. But that
does not necessarily make such complication a good idea.

To summerise my questions:

Q: Should we expect trouble (such as deadlocks or files being
overwritten while they are read) if NSD does a request-xfr to
oneself?

No, it should work.

Q: Upon NOTIFY, will NSD retrieve the latest zone from _all_
configured request-xfr hosts? If not, I would argue that it is
more consistent to try all sources and let the latest SOA serial
number win.

If the NOTIFY did not contain a soa serial: first try the master that
sent the notify, then subsequent masters, until the first one that has
a 'newer' serial number. Transfer it, end of algorithm.

If the NOTIFY did contain a soa serial: first try the master that sent
the notify, then subsequent masters, pick up improved soa serials and
transfer those zones, until you manage to fetch the serial number (or
better) from the notify. This could result in multiple zone
transfers, and probing many masters.

Q: Is it correct that editing a zone is possible with a sequence
nsd-update ; nsd-patch ; vi ; nsd-update?

nsd-update?

Q: Upon NOTIFY or nsd-update, will NSD read the zone file, even if
it has request-xfr setup for that zone? Clearly, I would argue
that an nsd-update should always be processed.

nsd-update? NSD-3 does not read the zonefile, unless you run zonec
(nsdc rebuild). NSD-4 can read the zonefile (and precompile that file
by itself), when you tell it to, with nsd-control read-this-zone, or
with SIGHUP (checks ftimes). NSD3 and NSD4 will send a NOTIFY to
slaves if a zone is updated from file and it has notify: actions
configured.

Q: Would the redirection of xfrd.state to /dev/null work?

Not sure about reading from it... Could take a very very long time or
give parse errors.

Q: Would an xfer from oneself keep xfrd.state happy and thus avoid
zone expiry on secondaries (which are also masters)?

Yes, probably. This is also considered a problem with loops in the
zone-transfer-graph. There exists a draft in the IETF to fix
non-expiry with transfer-loops (with an EDNS option I believe).

It's a bit wild, I know. But I don't think this is plain stupid.
Any opinions?

So, the design explicitly intends to work with mixed actions;
request-xfr and also provide-xfr, and so on, this is useful for
'intermediate' or 'distribution masters'. Making loops in the
master-slave topology can spell trouble, I have no experience with this.

Best regards,
   Wouter

Stupid question, have you considered syncing all of the publicly listed
masters, from a "hidden master"?

This technique is used in many places, and fullfills some of your
requirements. It also allows you to provide much more protection
on the master, since the only connections will be from known hosts.

Hi Wouter,

I didn't know that loops are a problem in authoritatives,
and can't even imagine why. Can you give me a pointer to
the comments and proposals about this? I would think that
cyclic structures are never a problem if idempotence ensures
that updates won't cycle indefinately: "Oh yeah, I've already
seen that SOA serial number, I'll ignore it."

> Q: Is it correct that editing a zone is possible with a sequence
> nsd-update ; nsd-patch ; vi ; nsd-update?

nsd-update?

Yes, I was a bit rough; I should have said

  nsdc update #possibly
  nsdc patch
  "$EDITOR" "$ZONE"
  ci -l "$ZONE"
  zonec -v -c /path/to/nsd.conf -z "$ZONE" -o "$ZONE"
  nsdc reload
  nsdc notify

As you said, this would indeed work to edit a file, even
if it has request-xfr: settings as well.

I'm pretty impressed that the structures in NSD are so
general, and clean, that they would allow this approach.

Clearly, I must weigh if there is a chance that NSD would
ever ban this approach before adopting it in my infra.

Thanks,
-Rick

Hi Wouter,

Just to let you and others know -- and document my experiences:

I've tried running NSD in multi-master for a while, permitting it
to send notifications in both directions. I was pretty certain that
this ought to work. In practice, one host was a master and the other
was a slave.

It did not work as well as I had expected. I never quite found out why,
but the daemon refuses to stop at some point, and "nsdc restart" blocks
the calling process.

Hope this is useful.

Cheers,
-Rick

Hi Rick,

Hi Wouter,

Just to let you and others know -- and document my experiences:

I've tried running NSD in multi-master for a while, permitting it
to send notifications in both directions. I was pretty certain
that this ought to work. In practice, one host was a master and
the other was a slave.

I do not understand what that means. NSD should be able to perform
the task you configure it for ...

In practice one got the update faster than the other? Probably
because it is first on the notify-list of the hidden-master.

It did not work as well as I had expected. I never quite found out
why, but the daemon refuses to stop at some point, and "nsdc
restart" blocks the calling process.

Sounds like you hit the bug that was fixed in 3.2.10 that caused slow
(looks like it stops) processing for some zones.

Best regards,
   Wouter

Hello,

> I've tried running NSD in multi-master for a while, permitting it
> to send notifications in both directions. I was pretty certain
> that this ought to work. In practice, one host was a master and
> the other was a slave.

I do not understand what that means. NSD should be able to perform
the task you configure it for ...

I had the two nodes send updates bidirectionally, so that each could
serve as a master. This is a normal path for replication, first there
is one node, then master/slave is made and finally master/master mode
is introduced. I saw a possibility to do this with NSD.

I've not actually fed it with information from various sources though.
One of the nodes was in practice always the first one to process new
zone versions. The only thing that probably was added is an update
sent back from the slave to the master, which I expected to be ignored.

To my surprise though, I've experienced difficulties stopping and
restarting NSD. As Casper Gielen reported, the problem may also be
related to something else.

This is not a bug report -- merely sharing experiences.

Cheers,
-Rick