No failover in stub-zone?

user20 · July 9, 2012, 8:52pm

Hello

we have on our border dns recursor (unbound 1.4.17) some stub-zones mostly for PTR lookups for our internal AS112 addresses like this:

stub-zone:
   name: "10.in-addr.arpa"
   stub-addr: <IP-first-internal-NS>
   stub-addr: <IP-second-internal-NS>

Today the first internal NS went down and most reverse lookups slow to crawl. I expexted unbound would notice the failure and simply only use the second after some time like it did with normal lookups when skipping unavailable NS.

Is this expected behaviour or have i done something wrong?

Many Thanks

Andreas

Wouter · July 10, 2012, 7:34am

Hi Andreas,

Hello

we have on our border dns recursor (unbound 1.4.17) some
stub-zones mostly for PTR lookups for our internal AS112 addresses
like this:

stub-zone: name: "10.in-addr.arpa" stub-addr:
<IP-first-internal-NS> stub-addr: <IP-second-internal-NS>

unbound will divide the load amongst the addresses.
It will randomise with RTT banding.

Today the first internal NS went down and most reverse lookups slow
to crawl. I expexted unbound would notice the failure and simply
only use the second after some time like it did with normal lookups
when skipping unavailable NS.

Is this expected behaviour or have i done something wrong?

The second server also fails?

Unbound should try both servers (randomly if they are working, for 50%
load on both of them).

Best regards,
Wouter

user20 · July 10, 2012, 11:02am

Zitat von "W.C.A. Wijngaards" <wouter@nlnetlabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Andreas,

Hello

we have on our border dns recursor (unbound 1.4.17) some
stub-zones mostly for PTR lookups for our internal AS112 addresses
like this:

stub-zone: name: "10.in-addr.arpa" stub-addr:
<IP-first-internal-NS> stub-addr: <IP-second-internal-NS>

unbound will divide the load amongst the addresses.
It will randomise with RTT banding.

Today the first internal NS went down and most reverse lookups slow
to crawl. I expexted unbound would notice the failure and simply
only use the second after some time like it did with normal lookups
when skipping unavailable NS.

Is this expected behaviour or have i done something wrong?

The second server also fails?

Unbound should try both servers (randomly if they are working, for 50%
load on both of them).

No, the second was available, and yes it looks like Unbound was balancing because some lookups where fast and some timeout. As far as i know Unbound does skip unresponsive servers when doing "normal" lookups (no stub-zones) and i suspected Unbound doing the same for the stub-zone servers. Might this be possible as a feature in the future? I think the same rules should apply for stub-zones as for all lookups, no?

Regards

Andreas

Wouter · July 10, 2012, 11:31am

Hi Andreas,

user20 · July 10, 2012, 11:53am

Zitat von "W.C.A. Wijngaards" <wouter@nlnetlabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Andreas,

Zitat von "W.C.A. Wijngaards" <wouter@nlnetlabs.nl>:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hi Andreas,

Hello

we have on our border dns recursor (unbound 1.4.17) some
stub-zones mostly for PTR lookups for our internal AS112
addresses like this:

stub-zone: name: "10.in-addr.arpa" stub-addr:
<IP-first-internal-NS> stub-addr: <IP-second-internal-NS>

unbound will divide the load amongst the addresses. It will
randomise with RTT banding.

Today the first internal NS went down and most reverse lookups
slow to crawl. I expexted unbound would notice the failure and
simply only use the second after some time like it did with
normal lookups when skipping unavailable NS.

Is this expected behaviour or have i done something wrong?

The second server also fails?

Unbound should try both servers (randomly if they are working,
for 50% load on both of them).

No, the second was available, and yes it looks like Unbound was
balancing because some lookups where fast and some timeout. As far
as i know Unbound does skip unresponsive servers when doing
"normal" lookups (no stub-zones) and i suspected Unbound doing the
same for the stub-zone servers. Might this be possible as a feature
in the future? I think the same rules should apply for stub-zones
as for all lookups, no?

This is the way it is implemented today. Unbound can failover for
stub-zones (and forward-zones) if nameservers do not respond and stops
asking if they are down.

Strange, it looks like this has not happened. The primary was down for around half an hour already, but Unbound seems still trying to reach the stub-zone primary for part of the queries which in turn time out.
I will test as soon as possible and give more details on this.

Regards

Andreas