Test setup problem: secondary expires zones

Hi,

I'm trying to run NSD as a secondary to a primary nameserver
(tinydns+axfrdns), which has served me well over the years, but is now
being phased out. For every zone, I put such a section into NSD's config
file:

zone:
        name: "example.com"
        zonefile: "example.com"
        outgoing-interface: 46.29.40.34
        #allow-notify: 46.29.40.35 NOKEY
        request-xfr: 46.29.40.35 NOKEY

With 46.29.40.35 being the IP of the primary. These packages even run on
the same host. When I initially set things up, everything went fine: NSD
pulled the zones, and, with "nsdctl patch", wrote them to local zone
files, too.

Now I find that, after some time, all zones expire despite the primary
still serving them, and other authorized secondaries have no problem
pulling them. IOW, they expire only on NSD, and I don't exactly know
why. If I do the same thing from the command line, using dig, I get the
zones transferred from the primary just fine, but axfrdns regularly logs
this in response to queries from NSD:

axfrdns: fatal: unable to locate information in data.cdb

NSD logs nothing, despite running in verbose mode.

strace shows that there is something fishy within nsd. I get tons
of these:

setsockopt(6, SOL_SOCKET, SO_REUSEADDR, "\2\0\0\0.\35(\"\0\0\0\0\0\0\0\0", 16) = 0
bind(6, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("46.29.40.34")}, 16) = 0
connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("46.29.40.35")}, 16) = -1 EINPROGRESS (Operation now in progress)
pselect6(8, [7], [6], [], {23, 997600000}, {NULL, 8}) = 1 (out [6], left {23, 997598102})
getsockopt(6, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
write(6, "\0l", 2) = 2
write(6, "\0043\0\0\0\1\0\0\0\1\0\0\10newwalls\2de\0\0\373\0\1\10ne"..., 108) = 108
read(6, 0x7f0a0336f7f8, 2) = -1 EAGAIN (Resource temporarily unavailable)
pselect6(8, [6 7], [], [], {23, 997100000}, {NULL, 8}) = 1 (in [6], left {23, 996821603})
read(6, "\0Y", 2) = 2
read(6, "\0043\204\0\0\1\0\0\0\1\0\0\10newwalls\2de\0\0\373\0\1\300\f\0"..., 89) = 89
close(6) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0

Both software packages run on the same machine, but currently, nsd
usually does not receive any queries from the Internet (unless you query
the ip directly).

Any pointers on what to do would be greatly appreciated!

Kind regards,
--Toni++

Hi Dmitry,

> With 46.29.40.35 being the IP of the primary. These packages even run on
> the same host. When I initially set things up, everything went fine: NSD
> pulled the zones, and, with "nsdctl patch", wrote them to local zone
> files, too.

I assume that your prmary server runs on .35 address and nsd server runs on .34 -- you have not

right.

included relevant portions of your configuration. If one (or both) of those servers use default address

sorry. No, both servers are pinned on their respective IPs. From my
nsd.conf:

server:
        # uncomment to specify specific interfaces to bind (default
  # all).
  ip-address: 46.29.40.34
  ...

and for the djbdns combo, it's

# cat /service/axfrdns/env/IP
46.29.40.35

My nsd is this (Debian Squeeze, amd64):
ii nsd3 3.2.8-3~bpo60+2

also, for notify messages to work, you better allow nsd to trust them (you already have statement there.)

djbdns does not understand any authentication, nor notifies. Therefore,
I have commented that out.

they expire because nsd cannot transfer zones from your primary for some reason.

Yes - the question is, why can't nsd update zones from the server
_after_ initially pulling all zones in without any problem?

Unfortunately, the DNS decoder in tcpdump appears to be weak...

Kind regards,
--Toni++

Hi Toni,

Hi,

I'm trying to run NSD as a secondary to a primary nameserver
(tinydns+axfrdns), which has served me well over the years, but is
now being phased out. For every zone, I put such a section into
NSD's config file:

zone: name: "example.com" zonefile: "example.com"
outgoing-interface: 46.29.40.34 #allow-notify: 46.29.40.35 NOKEY
request-xfr: 46.29.40.35 NOKEY

With 46.29.40.35 being the IP of the primary. These packages even
run on the same host. When I initially set things up, everything
went fine: NSD pulled the zones, and, with "nsdctl patch", wrote
them to local zone files, too.

Now I find that, after some time, all zones expire despite the
primary still serving them, and other authorized secondaries have
no problem pulling them. IOW, they expire only on NSD, and I don't
exactly know why. If I do the same thing from the command line,
using dig, I get the zones transferred from the primary just fine,
but axfrdns regularly logs this in response to queries from NSD:

I tried out your setup (but not using axfrdns as master) and it seems
to work for me.

axfrdns: fatal: unable to locate information in data.cdb

So dig is able to transfer the zone, without axfrdns logging this
message? What is the difference in query packet?

NSD logs nothing, despite running in verbose mode.

If I don't update the zone at the master, no logs are being produced,
but I see SOA queries going over the wire. If I update the zone, you
should see something like:

[1329129836] nsd[6042]: info: Zone example.com serial 23 is updated to 24.

strace shows that there is something fishy within nsd. I get tons
of these:

setsockopt(6, SOL_SOCKET, SO_REUSEADDR,
"\2\0\0\0.\35(\"\0\0\0\0\0\0\0\0", 16) = 0 bind(6,
{sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("46.29.40.34")}, 16) = 0 connect(6,
{sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("46.29.40.35")}, 16) = -1 EINPROGRESS (Operation
now in progress) pselect6(8, [7], [6], [], {23, 997600000}, {NULL,
8}) = 1 (out [6], left {23, 997598102}) getsockopt(6, SOL_SOCKET,
SO_ERROR, [0], [4]) = 0 write(6, "\0l", 2) =
2 write(6,
"\0043\0\0\0\1\0\0\0\1\0\0\10newwalls\2de\0\0\373\0\1\10ne"...,
108) = 108 read(6, 0x7f0a0336f7f8, 2) = -1 EAGAIN
(Resource temporarily unavailable) pselect6(8, [6 7], [], [], {23,
997100000}, {NULL, 8}) = 1 (in [6], left {23, 996821603}) read(6,
"\0Y", 2) = 2 read(6,
"\0043\204\0\0\1\0\0\0\1\0\0\10newwalls\2de\0\0\373\0\1\300\f\0"...,
89) = 89 close(6) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 6 fcntl(6, F_SETFL,
O_RDONLY|O_NONBLOCK) = 0

This shows that the socket is nonblocking and connecting cannot be
completed immediately. The read would block. Seems ok to me if the
response is not received (immediately).

Both software packages run on the same machine, but currently, nsd
usually does not receive any queries from the Internet (unless you
query the ip directly).

With both software packages, you mean? Both addresses seem to be non
responsive to me, by the way.

Best regards,
  Matthijs

Hi Matthijs,

> axfrdns: fatal: unable to locate information in data.cdb
So dig is able to transfer the zone, without axfrdns logging this
message? What is the difference in query packet?

I'm not clueful enough to understand the query packets, but I could see
nsd querying for the TLD, but not always querying for the full domain,
provided that the queried domain name is supposed to be contained in the
query packet in clear text (like querying for "net", not always
"oeko.net").

If I don't update the zone at the master, no logs are being produced,
but I see SOA queries going over the wire. If I update the zone, you
should see something like:

[1329129836] nsd[6042]: info: Zone example.com serial 23 is updated to 24.

I artificially updated zones with no other change than an increased
serial on the master, then restarted nsd, but to no effect.

This shows that the socket is nonblocking and connecting cannot be
completed immediately. The read would block. Seems ok to me if the
response is not received (immediately).

> Both software packages run on the same machine, but currently, nsd
> usually does not receive any queries from the Internet (unless you
> query the ip directly).

With both software packages, you mean? Both addresses seem to be non
responsive to me, by the way.

I am uncertain about what you mean. Is my network (46.29.40/21) not
being routed to you?

I have no trouble querying the servers from here, but I configured the
servers to not allow axfr from anywhere, only from select sources (the
secondaries). If you have an IP number for me, I can put you onto the
whitelist, too.

As for regular queries:

$ dig +tcp @46.29.40.35 oeko.net any <--- this is axfrdns

; <<>> DiG 9.7.3 <<>> @46.29.40.35 oeko.net any
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25192
;; flags: qr aa rd; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;oeko.net. IN ANY

;; ANSWER SECTION:
oeko.net. 2560 IN SOA a.ns.oeko.net. hostmaster.oeko.net. 1021018224 16384 2048 1048576 2560
oeko.net. 259200 IN NS a.ns.oeko.net.
oeko.net. 259200 IN NS a.ns.bsws.de.
oeko.net. 259200 IN NS c.ns.bsws.de.
oeko.net. 86400 IN MX 12848 d.mx.oeko.net.
oeko.net. 86400 IN A 46.29.42.25

;; ADDITIONAL SECTION:
a.ns.oeko.net. 86400 IN A 46.29.40.35
d.mx.oeko.net. 3600 IN A 46.29.42.41

;; Query time: 44 msec
;; SERVER: 46.29.40.35#53(46.29.40.35)
;; WHEN: Mon Feb 13 13:49:06 2012
;; MSG SIZE rcvd: 203

$ dig +tcp @46.29.40.34 oeko.net any <--- This is nsd

; <<>> DiG 9.7.3 <<>> @46.29.40.34 oeko.net any
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 47903
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;oeko.net. IN ANY

;; Query time: 44 msec
;; SERVER: 46.29.40.34#53(46.29.40.34)
;; WHEN: Mon Feb 13 13:49:09 2012
;; MSG SIZE rcvd: 26

Anything that you'd like me to test, specifically?

Kind regards,
--Toni++

Hi,

It turns out that axfrdns/djbdns does not implement RFC1995, IXFR. It
will respond to an IXFR query with NODATA. NSD marks this packet as
bad (too short) and it will retry again later.

To make NSD deal better with this, I have committed a change to the
NSD3 branch. NSD will fallback to query AXFR, if an IXFR request
results in a NODATA response. This behavior will not interfere with
the 'allow-axfr-fallback' option: if that is set to no, there will
still be no fallback to AXFR.

Best regards,
  Matthijs

Hi,