Unbound not always resolving immediately after start.

Hi,

Under FreeBSD I'm setting up a resolv-only unbound server. While testing
I've noticed some domain do not resolve (server returns SERVFAIL)

When running verbosily I noticed this in the log:

[1441963936] unbound[22814:0] info: processQueryTargets: ns.tweakdns.nl.
AAAA IN
[1441963936] unbound[22814:0] debug: request ns.tweakdns.nl. has
exceeded the maximum number of glue fetches 37
[1441963936] unbound[22814:0] debug: request ns.tweakdns.nl. has
exceeded the maximum number of glue fetches 37
[1441963936] unbound[22814:0] debug: return error response SERVFAIL
[1441963936] unbound[22814:0] debug: validator[module 0] operate:
extstate:module_state_initial event:module_event_moddone
[1441963936] unbound[22814:0] info: validator operate: query
ns.tweakdns.nl. AAAA IN
[1441963936] unbound[22814:0] debug: iterator[module 1] operate:
extstate:module_wait_subquery event:module_event_pass
[1441963936] unbound[22814:0] info: iterator operate: query
tweakers.net. A IN
[1441963936] unbound[22814:0] info: processQueryTargets: tweakers.net. A IN
[1441963936] unbound[22814:0] debug: out of query targets -- returning
SERVFAIL
[1441963936] unbound[22814:0] debug: return error response SERVFAIL

A second query about 15/20 second later does work and it's cached.

A lot of domain resolve from the start without any trouble. I don't
know where exactly to look for the problem. Is this a problem that could
reside in Unbound?

Regards,

Frank de Bot

I've seen symptoms here that are very similar to what you describe.

I had been using unbound as a recursive, caching server with no
forwarding enabled.

I would notice that the DNS lookups would stall (and the browser would
timeout on a DNS error) for certain websites. If I retried a few
seconds later, the DNS lookup would be fine. The website that elicited
the symptom most frequently for me was slashdot.org.

I was/am running unbound on FreeBSD 10.1. Initially, I saw the issue
running the local_unbound that is in FreeBSD base. I also installed
the unbound port, and saw the symptom there as well.

I didn't really do any in depth debugging, well, because other stuff was
going on in my life, and forwarding all DNS requests from unbound to my
ISP's DNS servers made the problem go away.

I've not had the time to get back the the problem and turn on debugging
to gather more info.

Hi,

SERVFAIL on tweakers.net seems to be from fix on CVE-2014-8500.
This fix essentially limits number of query (to authoritative servers)
to resolve target qname. If a qname requires many query to resolve
it becomes SERVFAIL This situation often occurs when cache is empty
(e.g. just after starting unbound or cache flush)

bind-users have discussed same issue last year:
  https://lists.isc.org/pipermail/bind-users/2014-December/thread.html

Possible workarounds are to increase MAX_TARGET_COUNT
(iterator/iterator.h) to relax number of query limitation but it may
reduce robustness against CVE-2014-8500-related attack.

Regards,

bind-users have discussed same issue last year:

discussion starts from this mail
  https://lists.isc.org/pipermail/bind-users/2014-December/094239.html

Hi,

SERVFAIL on tweakers.net seems to be from fix on CVE-2014-8500.
This fix essentially limits number of query (to authoritative servers)
to resolve target qname. If a qname requires many query to resolve
it becomes SERVFAIL This situation often occurs when cache is empty
(e.g. just after starting unbound or cache flush)

bind-users have discussed same issue last year:
  https://lists.isc.org/pipermail/bind-users/2014-December/thread.html

Possible workarounds are to increase MAX_TARGET_COUNT
(iterator/iterator.h) to relax number of query limitation but it may
reduce robustness against CVE-2014-8500-related attack.

I think it is worth considering not having to recompile Unbound.
It would be much nicer to have this configurable in unbound.conf.
Something similar like BIND allows by max-recursion-queries option.

Tomas

Hi Tomas,

Hi,

SERVFAIL on tweakers.net seems to be from fix on CVE-2014-8500.
This fix essentially limits number of query (to authoritative
servers) to resolve target qname. If a qname requires many query
to resolve it becomes SERVFAIL This situation often occurs when
cache is empty (e.g. just after starting unbound or cache flush)

bind-users have discussed same issue last year:
https://lists.isc.org/pipermail/bind-users/2014-December/thread.html

Possible workarounds are to increase MAX_TARGET_COUNT

(iterator/iterator.h) to relax number of query limitation but it
may reduce robustness against CVE-2014-8500-related attack.

I think it is worth considering not having to recompile Unbound. It
would be much nicer to have this configurable in unbound.conf.
Something similar like BIND allows by max-recursion-queries
option.

What value should we use for MAX_TARGET_COUNT? I'll increase the
compiled default to that value. Easier than a configuration option
that the user can get wrong and then be vulnerable.

Best regards,
   Wouter