NSD doing 3 IXFR queries in rapid succession

Hello NSD users and developers,

I was just looking at some BIND logs on one of our servers. It feeds a
downstream NSD 4.1.7 slave. In the BIND logs, I often see this:

20-Dec-2015 11:50:45.011 xfer-out: client 10.64.0.12#47701/key
main.ripe.net (132.72.185.in-addr.arpa): transfer of
'132.72.185.in-addr.arpa/IN': IXFR ended
20-Dec-2015 11:50:45.078 xfer-out: client 10.64.0.12#47704/key
main.ripe.net (132.72.185.in-addr.arpa): transfer of
'132.72.185.in-addr.arpa/IN': IXFR ended
20-Dec-2015 11:50:45.146 xfer-out: client 10.64.0.12#47707/key
main.ripe.net (132.72.185.in-addr.arpa): transfer of
'132.72.185.in-addr.arpa/IN': IXFR ended

Notice that NSD appears to be doing 3 IXFR queries for the same zone in
rapid succession.

I captured some packets using tcpdump, and then put them through PacketQ
with the filter:

select src_addr,src_port,qname,qtype,rcode,aname,atype from dns

The results corresponding to the above log is:

["10.64.0.12",47701,"132.72.185.in-addr.arpa.",251,0,"",0],
["93.175.159.250",53,"132.72.185.in-addr.arpa.",251,0,"132.72.185.in-addr.arpa.",6],
["10.64.0.12",47704,"132.72.185.in-addr.arpa.",251,0,"",0],
["93.175.159.250",53,"132.72.185.in-addr.arpa.",251,0,"132.72.185.in-addr.arpa.",6],
["10.64.0.12",47707,"132.72.185.in-addr.arpa.",251,0,"",0],
["93.175.159.250",53,"132.72.185.in-addr.arpa.",251,0,"132.72.185.in-addr.arpa.",6]

The packet capture also shows shows the same thing: 3 IXFR queries in
rapid succession, with the same responses.

Does anyone have any idea why NSD is doing 3 queries per zone like this?

Regards,
Anand

Hi Anand,

Hello NSD users and developers,

I was just looking at some BIND logs on one of our servers. It
feeds a downstream NSD 4.1.7 slave. In the BIND logs, I often see
this:

20-Dec-2015 11:50:45.011 xfer-out: client 10.64.0.12#47701/key
main.ripe.net (132.72.185.in-addr.arpa): transfer of
'132.72.185.in-addr.arpa/IN': IXFR ended 20-Dec-2015 11:50:45.078
xfer-out: client 10.64.0.12#47704/key main.ripe.net
(132.72.185.in-addr.arpa): transfer of
'132.72.185.in-addr.arpa/IN': IXFR ended 20-Dec-2015 11:50:45.146
xfer-out: client 10.64.0.12#47707/key main.ripe.net
(132.72.185.in-addr.arpa): transfer of
'132.72.185.in-addr.arpa/IN': IXFR ended

Notice that NSD appears to be doing 3 IXFR queries for the same
zone in rapid succession.

I captured some packets using tcpdump, and then put them through
PacketQ with the filter:

select src_addr,src_port,qname,qtype,rcode,aname,atype from dns

The results corresponding to the above log is:

["10.64.0.12",47701,"132.72.185.in-addr.arpa.",251,0,"",0],
["93.175.159.250",53,"132.72.185.in-addr.arpa.",251,0,"132.72.185.in-a

ddr.arpa.",6],

["10.64.0.12",47704,"132.72.185.in-addr.arpa.",251,0,"",0],

["93.175.159.250",53,"132.72.185.in

addr.arpa.",251,0,"132.72.185.inaddr.arpa.",6],
["10.64.0.12",47707,"132.72.185.in-addr.arpa.",251,0,"",0],
["93.175.159.250",53,"132.72.185.in-addr.arpa.",251,0,"132.72.185.in-a

ddr.arpa.",6]

The packet capture also shows shows the same thing: 3 IXFR

queries in rapid succession, with the same responses.

Does anyone have any idea why NSD is doing 3 queries per zone like
this?

NSD wants to fetch an update for the zone. The server does not
provide one. NSD tries all masters several times to find the update.
This is where this server sees multiple requests.

The reason for fetching it may be the SOA refresh timer or a NOTIFY.

Load balancers, and the server may be in the process of loading the
data, so the third query may return a different result. Also, the
code, currently, just retries in the face of no result, which makes
for simpler code.

Best regards, Wouter

Hi Wouter,

NSD wants to fetch an update for the zone. The server does not
provide one. NSD tries all masters several times to find the
update. This is where this server sees multiple requests.

This is the part that does not make sense. When the SOA refresh timer
expires, NSD should of course try and update the zone. However, I
think *one* query against *one* master should be enough.

I do not think that sending 3 queries in rapid succession to the same
master is beneficial. If a master does not have a newer copy of the
zone now, it is unlikely to have a newer copy just a few milliseconds
later. If I have 4 masters configured for a zone, NSD is making 12 TCP
connections in total, just to try and update *one* zone. Then, it
moves on to the next zone. This is rather wasteful, don't you think? A
slave that has a large number of zones is going to end up making far
too many queries for refreshing these zones (well, at least 3 times as
many).

I think the correct behaviour should be that when the SOA refresh
timer expires, NSD should try to refresh the zone from one of the
configured masters, and if the serial number hasn't changed, then
accept that, and move on to other tasks.

In the case of a NOTIFY message, a slightly different behaviour is
desirable. NSD should attempt to refresh the zone from the master that
sent it the NOTIFY message (because it is almost certain to have a
newer copy of the zone), and if that fails, then try the other
masters, once each. Here, I define failure as one of:

1. Same serial number (instead of an expected newer serial)
2. REFUSED
3. SERVFAIL
4. Timeout

The reason for fetching it may be the SOA refresh timer or a
NOTIFY.

Load balancers, and the server may be in the process of loading
the data, so the third query may return a different result. Also,
the code, currently, just retries in the face of no result, which
makes for simpler code.

The case of multiple masters behind a load balancer is rare. I would
even go as far as to say that it is not a good configuration, because
it confuses the hell out of a client attempting to refresh zones from
the master(s).

Have you actually had any reports of people running master servers
behind load balancers, and asking for slaves to try refresh 3 times?

None of the alternative servers I know of (BIND, Knot, YADIFA) try to
refresh a zone with multiple queries and connections.

Regards,
Anand

Hi Wouter (and any other users)

Have you had a chance to think about my thoughts (repeated below)
about NSD's refresh strategy?

Anand

Hi Anand,

Yes treating the refresh timer the same as a notify is what is
happening. Searching thoroughly for an answer is what is intended,
and that includes trying multiple times, and to every master. That
behaviour is the same as for Unbound, which also attempts to find
information at every server available, and several attempts.

So, doing multiple queries in succession when searching for
information is not something I want to change. However, adding logic
to act differently to save those queries in the case of a refresh
timer to not sound really worthwhile to me, either.

Best regards, Wouter