Unbound can't resolve certain domains

Hello, unbound-users.

I'm using Unbound 1.8.1 on FreeBSD 12.0-RELEASE. It works fine with
the majority of domains, but it can't resolve one particular domain,
FreeBSD.org. Everything else works perfectly. I'm able to resolve the
FreeBSD.org domain when using another nameserver (8.8.8.8 for example).

~ # cat /etc/resolv.conf
nameserver 127.0.0.1

Hint: freebsd.org is dnssec enabled, google.com is not.

Can you resolve other dnssec enabled domains, e.g. internetsociety.org?

R.

Thank you for the hint.

internetsociety.org works fine. But I just noticed many of the DNSSEC-enabled
domains don't work. Also, some domains that don't use DNSSEC don't work.
lucidsolutions.co.nz is an example.

The point why I was asking about dnssec is that dnssec queries more
often use tcp rather than udp, just because of query size. So, if your
unbound is unable to send tcp queries (for whatever reason), it may
fail for some domains (those that require tcp queries for resolving),
and succeed for others. This _may_ be the reason for the network error.

R.

In article <CA+g814c3XZ=pKw-kHO0O3T3+8OhMkvDZufJR9DvzgdbjuEgsbw@mail.gmail.com> you write:

Thank you for the hint.

internetsociety.org works fine. But I just noticed many of the DNSSEC-enabled
domains don't work. Also, some domains that don't use DNSSEC don't work.
lucidsolutions.co.nz is an example.

FYI, I run unbound on FreeBSD 11.3 and 12.0 and it resolves all of
those domains without any trouble.

Hi,

just another data point and no final conclusion: we're running BIND
as recursors, and have also received reports that lookup of various
names under FreeBSD.org intermittently fails. I've seen reports
about failing queries for at least

bugs.freebsd.org / a
_http._tcp.freebsd.org / srv

Our recursors run with DNSSEC validation enabled, and don't have any
"general problems with DNSSEC".

Passing either of those names above to the machinery at
https://dnsviz.net reveals no problems per se with the DNSSEC setup
for freebsd.org. The only oddity I can find (this isn't really an
error), is that none of the zones where the name servers serving
freebsd.org are registered are DNSSEC-secured:

freebsd.org. 3600 IN NS ns2.isc-sns.com.
freebsd.org. 3600 IN NS ns3.isc-sns.info.
freebsd.org. 3600 IN NS ns1.isc-sns.net.

is the NS-set, but neither isc-sns.com. isc-sns.info nor isc-sns.net
are DNSSEC-secured (there is no DS record in the parent zone).

However, running this small script:

----------snip
#! /bin/sh

qn=bugs.freebsd.org.
# or point explicitly...
recursor=$(awk '/nameserver/{ print $2; exit; }' /etc/resolv.conf)

while true; do
        out=$(dig @$recursor $qn a)
        if ! expr "$out" : ".*, status: NOERROR" >/dev/null; then
                echo
                echo "$out"
                exit 1
        fi
        echo -n .
        sleep 30
done
----------snip

fails relatively quickly for me:

% ./check-bugs-freebsd-org.sh
..................

; <<>> DiG 9.10.5-P1 <<>> @2001:700:xx:xx::ca53 bugs.freebsd.org. a
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 60387
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;bugs.freebsd.org. IN A

;; Query time: 3989 msec
;; SERVER: 2001:700:xx:xx::ca53#53(2001:700:xx:xx::ca53)
;; WHEN: Fri Oct 18 10:21:36 CEST 2019
;; MSG SIZE rcvd: 45
%

The "query errors" log of BIND contains:

Oct 18 10:21:36 oliven named[20568]: client @0x7491477fb000 2001:700:x:0:xx:xx:xx:xx#51269 (bugs.freebsd.org): query failed (timed out) for bugs.freebsd.org/IN/A at query.c:6818
Oct 18 10:21:36 oliven named[20568]: client @0x7490e07fd000 a.b.c.d#54158 (bugs.freebsd.org): query failed (timed out) for bugs.freebsd.org/IN/A at query.c:6818

I also have a "dnscap" going, and I find this matches with these
queries:

10:21:32.961179 IP6 2001:700:x:0:xx:xx:xx:xx.51269 > 2001:700:xx:xx::ca53.53: 60387+ [1au] A? bugs.freebsd.org. (45)
10:21:33.971546 IP a.b.c.d.54158 > 158.38.0.168.53: 60387+ [1au] A? bugs.freebsd.org. (45)
10:21:34.019102 IP6 2001:700:xx:xx::ca53.56594 > 2001:5a0:10::1.53: 317% [1au] AAAA? ns2.isc-sns.com. (56)
10:21:34.019319 IP6 2001:700:xx:xx::ca53.53327 > 2001:5a0:10::1.53: 14440% [1au] A? ns3.isc-sns.info. (57)
10:21:36.949709 IP6 2001:700:xx:xx::ca53.53 > 2001:700:x:0:xx:xx:xx:xx.51269: 60387 ServFail 0/0/1 (45)
10:21:36.949747 IP 158.38.0.168.53 > a.b.c.d.54158: 60387 ServFail 0/0/1 (45)

It also seems that the queries to 2001:5a0:10::1 for ns2.isc-sns.com
and ns3.isc-sns.info went unanswered -- there's no further trace of
those query-IDs in the dnscap log.

It doesn't seem that the SERVFAIL is caused by DNSSEC validation
failure, at least. The next candidate on the list is probably
(temporarily) unresponsive publishing name servers(?)

Best regards,

- HÃ¥vard

As I already said: This seems like IP fragmentation errors on IPv6.
Many IPv6 networks seem to drop fragmented IP packets (entirely wrong imho,
but reality).
IPv6 mandates a minimum MTU of 1280 octets.
If DNS servers are configured to a EDNS buffer size of 4096 and responses are
smaller than 4096 bytes, but larger than 1280, those DNS responses get
fragmented and subsequently dropped.
(Responses larger than the EDNS buffer size will be retried using TCP where no
IP fragmentation will take place).

You can test this on your resolver machine with dig command line utility using
some DNS test entries I configured on my DNS server:
# dig +multiline -t TXT test765.molitor-dietzel.de @2a01:4f8:190:13cf::1:2
--> should yield ";; MSG SIZE rcvd: 925", thus being below 1280 bytes)
# dig +multiline -t TXT test765.molitor-dietzel.de @5.9.139.219
--> should yield the same response)

If that works, try the following DNS entries using IPv6 and IPv4:
test1020 (;; MSG SIZE rcvd: 1182) --> no IPv6 fragmentation
test1275 (;; MSG SIZE rcvd: 1348) --> no IPv6 fragmentation or two fragments
test1530 (;; MSG SIZE rcvd: 1604) --> most likely two IPv6 fragments
test2550 (;; MSG SIZE rcvd: 2628) --> two or three IPv6 fragments

Check if IPv4 and IPv6 responses are the same. If my guess is correct, your
IPv6 response should start to timeout when responses get fragmented, while
your IPv4 responses probably still come through.

- tmolitor