Unbound and intermittent network connectivity?

Robert_Edmonds · December 18, 2015, 7:05pm

Hi,

I have a few recent bug reports from Debian users that Unbound stops
resolving after brief interruptions in network connectivity. Especially
from users on laptops, which are typically not as well-connected as
servers or workstations with wired Ethernet connections.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=791659

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808204

A few questions:

Is my guess that Unbound stores unreachability information for
particular nameservers in the "infra cache" correct? Does this also
apply to forwarders? Does that mean if a user is running Unbound in
forwarding mode and has a brief network outage, they have to wait until
an "infra-host-ttl" expiration (default 15 minutes) occurs before
resolution service works again?

Is the format of the "dump_infra" output documented anywhere? I've
started reading source code to figure it out, but it would be nice to
have some "this is good" and "this is bad" examples. E.g., at first
glance I misread "lame dnssec 0" to mean "this server is lame, and does
not support DNSSEC", which appears to be the opposite of what it means

Should distros be doing something on network change events to get
Unbound to purge unreachability information? I think "flush_infra all"
would do it, but isn't this quite disruptive? (Maybe unreachability
information could be cached with a different TTL than the other
attributes for entries in the infra cache?)

Should distros lower "infra-host-ttl" in general, or for laptop users in
particular?

How should we deal with brief interruptions in network connectivity past
the first hop (say, outage inside the ISP backbone) that don't trigger
events?

Thanks!

Wouter · January 4, 2016, 9:29am

Hi Robert,

Hi,

I have a few recent bug reports from Debian users that Unbound
stops resolving after brief interruptions in network connectivity.
Especially from users on laptops, which are typically not as
well-connected as servers or workstations with wired Ethernet
connections.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=791659

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808204

A few questions:

Is my guess that Unbound stores unreachability information for
particular nameservers in the "infra cache" correct? Does this
also apply to forwarders? Does that mean if a user is running
Unbound in forwarding mode and has a brief network outage, they
have to wait until an "infra-host-ttl" expiration (default 15
minutes) occurs before resolution service works again?

Yes it applies to forwarders.

Is the format of the "dump_infra" output documented anywhere?
I've started reading source code to figure it out, but it would be
nice to have some "this is good" and "this is bad" examples. E.g.,
at first glance I misread "lame dnssec 0" to mean "this server is
lame, and does not support DNSSEC", which appears to be the
opposite of what it means

It is not documented. It also changed between versions of unbound as
the internal cache contents contains different values.
daemon/remote.c:dump_infra_host() has the print statement. The ping
value is the interesting "how may millisecond" value. Values of
120000 indicate unbound thinks it is unreachable.

Should distros be doing something on network change events to get
Unbound to purge unreachability information? I think "flush_infra
all" would do it, but isn't this quite disruptive? (Maybe
unreachability information could be cached with a different TTL
than the other attributes for entries in the infra cache?)

Yes that is a good idea. It is not disruptive. (it could be
disruptive for a high-load server, that is now going to probe distant
servers that are experiencing a high packet rate).

Should distros lower "infra-host-ttl" in general, or for laptop
users in particular?

I would distinguish between end-hosts and recursive-servers.

How should we deal with brief interruptions in network connectivity
past the first hop (say, outage inside the ISP backbone) that don't
trigger events?

That is what the TTL is for.

Best regards, Wouter