Handling of zone transfers and notify messages

Hello all,

I'm currently busy trying to setup a server running NSD and new to that. From what I read on the list archives and documentation, zone transfers are not automatically handled. Suggestion is to use a cronjob and some people talked about using a plugin for such a task.

My question is, what is the mechanism the most used amongst people on this list? And why?

My thoughts is that when running a server as secondary for domains you don't have control on, the notify messages are quite useful to provide a good service to your users. On the other hand, too much notify messages can induce unneeded load on the server.

Is there a publicly available plugin that could solve both issues: accept notify messages and trigger a transfer if needed, avoid too many notify messages?

Thanks for your time,

Antoine.

[On 15 Oct, @ 15:43, Antoine wrote in "Handling of zone transfers and ..."]

My question is, what is the mechanism the most used amongst people on this
list? And why?

I don't know what's used most, but what I use at home is just:
% more nsd
# do hourly nsdc update, to update the slave information
# 17 over the hour
17 * * * * root /usr/sbin/nsdc update

ie. a cronjob which checks every hour. If there are no updates, there
is no ouput and you will not receive mail. If there is some problem
connecting to a server you will get mailed every hour however.

There is some stuff in the development pipeline, but I don't know the
precise status of that.

grtz Miek

Miek Gieben wrote:

There is some stuff in the development pipeline, but I don't know the
precise status of that.

The stuff in the NSD development pipeline right now is support for TSIG and an AXFR client, so you no longer need to install a bind8 named-xfer to do zone transfers for secondaries. Currently notifies are just logged and there are no ready plans for doing anything more at the moment.

Erik

Antoine Delvaux <antoine.delvaux@belnet.be> writes:

My thoughts is that when running a server as secondary for domains
you don't have control on, the notify messages are quite useful to
provide a good service to your users. On the other hand, too much
notify messages can induce unneeded load on the server.

I'm still running 1.2.2 (been running nsd for a couple of years at
this point) to secondary about 1000 zones, most of which are in turn
secondaried from other hosts on the server from which I'm pulling
them. The vast majority of them (98%) are Someone Else's Zones.

One problem I've noticed (hopefully fixed in newer versions) is that
"nsdc update" does not deal gracefully with having an expired or
non-transferable zone. To my way of thinking it should either build
nsd.db including the expired data but unset the authority bit in
replies or simply leave the zone out of nsd.db rather than refusing to
update the database for the other zones.

My work-around has been to have a "metaconfig" nsd.zones file which is
compiled into the "working" nsd.zones file before each "nsdc update"
by a script that issues a query for the SOA record against the master
nameserver for each zone and checks for the authority bit. Crude, but
effective.

If I need a swat with the clown hammer and incentive to upgrade,
someone please tell me to get off my lazy behind. :slight_smile:

                                        ---Rob

Robert E.Seastrom wrote:

One problem I've noticed (hopefully fixed in newer versions) is that
"nsdc update" does not deal gracefully with having an expired or
non-transferable zone. To my way of thinking it should either build
nsd.db including the expired data but unset the authority bit in
replies or simply leave the zone out of nsd.db rather than refusing to
update the database for the other zones.

I actually wasn't aware of this problem. So this has not been fixed in any version of NSD. Whatever the fix (if any) may be of course.

Erik

It is probably a side effect of the same issue that hit me with one of my
NSD boxen, where the inability of zonec (and perhaps nsd) to grok AFSDB
records led to all updates being suspended. The fix was, of course, more
straightforward; simply support AFSBD. Until that was in, I had to leave
out that zone from my nsd.zones file, which was crude yet necessary to be
able to continue serving the other zones.

However, I (like Robert) feel a need for more graceful handling of partial
master b0rkenness from the slave perspective; if a zone can't be compiled
(for whatever reason) deal with it and continue to build the database,
including the working zones and signal (some way) that there are things
missing. I'm not certain about the "no AA bit" vs "no zone at all" choice.
"No zone at all" looks cleaner on the outside, but one might benefit
(albeit in a dirty, tainted way) from non-AA answers from a server you'd
expect to hand out trusty AA's.

Måns Nilsson <mansaxel@sunet.se> writes:

It is probably a side effect of the same issue that hit me with one of my
NSD boxen, where the inability of zonec (and perhaps nsd) to grok AFSDB
records led to all updates being suspended. The fix was, of course, more
straightforward; simply support AFSBD. Until that was in, I had to leave
out that zone from my nsd.zones file, which was crude yet necessary to be
able to continue serving the other zones.

Yep, precisely so. I recall having a record type in there a couple of
years ago that wasn't supported which caused me identical results.

However, I (like Robert) feel a need for more graceful handling of partial
master b0rkenness from the slave perspective; if a zone can't be compiled
(for whatever reason) deal with it and continue to build the database,
including the working zones and signal (some way) that there are things
missing.

It would sure be nice to be able to make the zone info compilation
almost-completely-un-chatty too - where the only output is in the form
of errors. My disclaimer about "maybe this has already been done"
still applies... have you ever tried to eyeball-scan the output from
a couple of thousand zones to find the ONE ZONE that is hosing your
update?

I'm not certain about the "no AA bit" vs "no zone at all" choice.
"No zone at all" looks cleaner on the outside, but one might benefit
(albeit in a dirty, tainted way) from non-AA answers from a server you'd
expect to hand out trusty AA's.

Well, not to hold up BIND as a paragon of the Right Thing (cough), but
if you have a typo in the zone file that confuses the parser, BIND
will continue to serve up whatever data it was able to figure out,
albiet non-authoritatively. Likewise, if the zone is expired but it
was able to at least load data that it got previously, BIND will
continue to serve the data without the AA bit set (and disallow AXFR).

Note that my current hack makes the server go lame instead of
non-authoritative. Having all one's secondary servers go lame because
of a problem with the primary server is a problem I'd rather not have
- not in the least because people are doofuses and put too-short
expiry times in their SOAs because they don't know the difference
between expire and min ttl.

I'd rather have the server continue serving the data and go
non-authoritative. I can see where reasonable people may disagree.
Perhaps a good way of addressing the problem is to make it
compile-time tuneable, and then we can have an arm-wrestling match for
what the default behavior should be in the distribution. :slight_smile:

(or compromise by having the behavior modify itself based on the value
least significant bit of the rightmost number in the version
identifier, heh heh heh)

--
Måns Nilsson Systems Specialist
+46 70 681 7204 KTHNOC
                        MN1334-RIPE

                                        ---Rob

Robert E.Seastrom writes:

Well, not to hold up BIND as a paragon of the Right Thing (cough), but if you have a typo in the zone file that confuses the parser, BIND will continue to serve up whatever data it was able to figure out, albiet non-authoritatively.

In a sense, BIND indirectly defines the right thing: One can say that a program behaves correctly if it does what the user expects, and in the case of NSD, a lot of user expectations have been shaped by BIND.

I'd rather have the server continue serving the data and go non-authoritative. I can see where reasonable people may disagree. Perhaps a good way of addressing the problem is to make it compile-time tuneable, and then we can have an arm-wrestling match for what the default behavior should be in the distribution. :slight_smile:

How about matching BIND in such odd cases unless there's a good reason to, and then spending effort on a good way to alert the operator in case of errors instead of on an arm-wrestling match? After all, the _right_ way to deal with this situation is to make it go away quickly.

Seriously, how about this? nsdc logs errors as per usual, and also writes a status file. The same file is written every time, and its content doesn't change unless there's an error.

It could look like this two-liner:

Zones: nnnn (prinary nnnn, secondary nnnn).
Zones with errors: <names>

On freebsd, it's really easy to keep looking at such a file. There's a system which will mail it to the administrator every time the file changes.

Arnt

[On 15 Oct, @ 17:14, Erik wrote in "Re: Handling of zone transfers ..."]

Robert E.Seastrom wrote:
>One problem I've noticed (hopefully fixed in newer versions) is that
>"nsdc update" does not deal gracefully with having an expired or
>non-transferable zone. To my way of thinking it should either build
>nsd.db including the expired data but unset the authority bit in
>replies or simply leave the zone out of nsd.db rather than refusing to
>update the database for the other zones.

I actually wasn't aware of this problem. So this has not been fixed in
any version of NSD. Whatever the fix (if any) may be of course.

looking at the code segment in nsdc (in nsd 2.1.2 - but this code has
been fairly stable for some time IIRC). It's basicly: [in semi perl]

foreach $zone (@zones) {
  axfr $zone > $zone_file

  if ($zone_file newer $nsd_db)
    $rebuild = yes;
}

if ($rebuild == $yes)
  nsdc rebuild && nsdc reload

So if one zonetranser succeeds the db is rebuild. If for whatever
reason a transfers fails, then nsd will keep on serving the old
data.

Is this not the desired behavior? Or am I missing something (obvious)?

grtz Miek

This is exactly the desired behaviour if there is but one zone in the
database. If there are more than one zone I don't want a failure to load or
compile one of these zones to cause *all the other zones* to not be
rebuilt. At any given point in time the name server should serve the latest
version of all zones available, and not hang up on one being broken. I have
a machine with one zone being very critical (ccTLD) and some less but still
critical (IN-ADDR.ARPA, etc). When one of these less critical had an error
(AFSDB bug) zonec refused to rebuild and the ccTLD started to, while still
being available, hand out old data. Doubleplusbad, especially in a SLA
situation where "speed of update propagation" is both monitored and fined.

On top of this comes the issue what should be done with failed zones.
Several outcomes are possible, as has been mentioned above;

1. go SERVFAIL, ie. remove zone.

2. go lame, ie. remove AA but serve and refuse AXFR. (BIND method up to
expiry.)

3. hand out old data with AA bit set and pretend it is raining.

Nos 1 and 2 are probably more clever than 3. In effect, #3 is what is being
done today, with all the other zones in that particular nsd instance --
hence the SLA issues.

Clearer?

[On 18 Oct, @ 14:59, Måns wrote in "Re: Handling of zone transfers ..."]

> Is this not the desired behavior? Or am I missing something (obvious)?

<SNIP explanation>

On top of this comes the issue what should be done with failed zones.
Several outcomes are possible, as has been mentioned above;

1. go SERVFAIL, ie. remove zone.

2. go lame, ie. remove AA but serve and refuse AXFR. (BIND method up to
expiry.)

3. hand out old data with AA bit set and pretend it is raining.

Nos 1 and 2 are probably more clever than 3. In effect, #3 is what is being
done today, with all the other zones in that particular nsd instance --
hence the SLA issues.

Clearer?

yes, very much so, thanks.

About the 3 points you mention. #2 is rather hard to do for an
authoritative only server... :slight_smile:

So IMO that only leaves #1, as people have been doing with wrapper
scripts. I will look into it,

grtz Miek

It may be useful to first have consensus on what we really want.
Then we can see what we can do when our own AXFR is ready, and what
we can do in the mean time using the bind-8 AXFR.

I'll try to summarize:

Suppose we have two more zones to service, some of which comes from
elsewhere, f.i. with AXFR (but scp, rsync, whatever is also possible,
but the main thing is that we have no control over the contents).

We have 3 separate tools:
1. The AXFR (scp, rsync, whatever) tool and wrapper-script.
2. Zonec and wrapper-script.
3. The nsd daemon process.

We want 1:
- to check that the AXFR (scp,..) has succeeded and, as far as possible,
  to check for syntax-errors.
- if above has failed, we want to keep the old zone-file (i.e. copy
  the current zone-file to the new temp directory.
This way we make sure we have the complete set of zones.

We want either 1, 2, or 3 (??) to check whether the zone-file has
not expired. I think 1 would be best. 2 is perhaps possible.

I'd like to keep 3 as mean and lean as possible, and thus not to
clobber the daemon with it: when a zone has expired, 3 (the NSD
daemon) should just not serve it (remove AA is not suitable for an
auth-only server, and to hand out expired data is plain wrong). Meaning
that tool 1 or 2 should just delete the expired zone-file.

-- ted

Miek Gieben <miekg@atoom.net> writes:

[On 18 Oct, @ 14:59, Måns wrote in "Re: Handling of zone transfers ..."]

> Is this not the desired behavior? Or am I missing something (obvious)?

<SNIP explanation>

On top of this comes the issue what should be done with failed zones.
Several outcomes are possible, as has been mentioned above;

1. go SERVFAIL, ie. remove zone.

2. go lame, ie. remove AA but serve and refuse AXFR. (BIND method up to
expiry.)

3. hand out old data with AA bit set and pretend it is raining.

Nos 1 and 2 are probably more clever than 3. In effect, #3 is what is being
done today, with all the other zones in that particular nsd instance --
hence the SLA issues.

Clearer?

yes, very much so, thanks.

About the 3 points you mention. #2 is rather hard to do for an
authoritative only server... :slight_smile:

How so? It's just a bit in the reply, you serve the data, but don't
claim to be authoritative.

So IMO that only leaves #1, as people have been doing with wrapper
scripts. I will look into it,

Thanks!

                                        ---Rob

This is a good summary.

While we are summarizing, zonec must also bypass broken data as defined
above, dealing with it as defined above, but not have one such broken zone
show-stop the entire NSD instance. (Yeah, I know, I wrote exactly that in
my last mail but I find it missing above. Sorry for the repetititititions.)

[On 19 Oct, @ 12:07, Måns wrote in "Re: Handling of zone transfers ..."]

This is a good summary.

While we are summarizing, zonec must also bypass broken data as defined
above, dealing with it as defined above, but not have one such broken zone
show-stop the entire NSD instance. (Yeah, I know, I wrote exactly that in
my last mail but I find it missing above. Sorry for the repetititititions.)

the following patch for nsdc.sh.in does the following:

* it axfr's the zone - if this fails it emits a warning
* then it tries to compile to the zone - if this fails another warning
  is given

If both the axfr & compilation have completed succesfully then the
database for nsd is rebuild. In all other cases the current version
of the zones/database is used.

Note1: each zone that is axfr-ed is compiled twice
Note2: aux. files are used: $zone.axfr (for the axfr) and
       $zone.axfr.db for the test compile

Index: nsdc.sh.in