Resolving of some special host very slow

Hello,

today we discovered a hostname which is very slow to resolv with Unbound 1.4.21 as validating resolver. It works fine with all knid of other resolvers and oddly enough even with another Unbound instance.
The host in question is esta.cbp.dhs.gov and resolve time after it is not in the cache range from around 2 to 5 seconds. I have take a tcpdump and can only see that the first answer come much faster but Unbound keeps asking for the same A record on different nameservers again and again.

Any idea what is going wrong?

Thanks

Andreas

Noticing the same issue:

[root@lasciel system]# time dig esta.cbp.dhs.gov @::1

; <<>> DiG 9.9.2-P2 <<>> esta.cbp.dhs.gov @::1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33583
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;esta.cbp.dhs.gov. IN A

;; ANSWER SECTION:
esta.cbp.dhs.gov. 900 IN A 216.81.87.20

;; Query time: 2097 msec
;; SERVER: ::1#53(::1)
;; WHEN: Wed Mar 5 12:42:40 2014
;; MSG SIZE rcvd: 61

real 0m12.201s
user 0m0.060s
sys 0m0.010s

[root@ferrovax ~]# time dig esta.cbp.dhs.gov

; <<>> DiG 9.9.5 <<>> esta.cbp.dhs.gov
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14872
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 8, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;esta.cbp.dhs.gov. IN A

;; ANSWER SECTION:
esta.cbp.dhs.gov. 900 IN A 216.81.87.20

;; AUTHORITY SECTION:
dhs.gov. 86398 IN NS use1.akam.net.
dhs.gov. 86398 IN NS use3.akam.net.
dhs.gov. 86398 IN NS usw4.akam.net.
dhs.gov. 86398 IN NS asia2.akam.net.
dhs.gov. 86398 IN NS eur2.akam.net.
dhs.gov. 86398 IN NS usc2.akam.net.
dhs.gov. 86398 IN NS usw3.akam.net.
dhs.gov. 86398 IN NS asia3.akam.net.

;; Query time: 2406 msec
;; SERVER: ::1#53(::1)
;; WHEN: Wed Mar 05 17:43:57 UTC 2014
;; MSG SIZE rcvd: 223

real 0m2.519s
user 0m0.009s
sys 0m0.103s

First one is against an unbound instance, second is against a BIND instance.

Perhaps one of the authoritative NSes are slow to respond for whatever
reason?

Zitat von staticsafe <me@staticsafe.ca>:

Hello,

today we discovered a hostname which is very slow to resolv with Unbound
1.4.21 as validating resolver. It works fine with all knid of other
resolvers and oddly enough even with another Unbound instance.
The host in question is esta.cbp.dhs.gov and resolve time after it is
not in the cache range from around 2 to 5 seconds. I have take a tcpdump
and can only see that the first answer come much faster but Unbound
keeps asking for the same A record on different nameservers again and
again.

Any idea what is going wrong?

Thanks

Andreas

Noticing the same issue:

[root@lasciel system]# time dig esta.cbp.dhs.gov @::1

; <<>> DiG 9.9.2-P2 <<>> esta.cbp.dhs.gov @::1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33583
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;esta.cbp.dhs.gov. IN A

;; ANSWER SECTION:
esta.cbp.dhs.gov. 900 IN A 216.81.87.20

;; Query time: 2097 msec
;; SERVER: ::1#53(::1)
;; WHEN: Wed Mar 5 12:42:40 2014
;; MSG SIZE rcvd: 61

real 0m12.201s
user 0m0.060s
sys 0m0.010s

[root@ferrovax ~]# time dig esta.cbp.dhs.gov

; <<>> DiG 9.9.5 <<>> esta.cbp.dhs.gov
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14872
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 8, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;esta.cbp.dhs.gov. IN A

;; ANSWER SECTION:
esta.cbp.dhs.gov. 900 IN A 216.81.87.20

;; AUTHORITY SECTION:
dhs.gov. 86398 IN NS use1.akam.net.
dhs.gov. 86398 IN NS use3.akam.net.
dhs.gov. 86398 IN NS usw4.akam.net.
dhs.gov. 86398 IN NS asia2.akam.net.
dhs.gov. 86398 IN NS eur2.akam.net.
dhs.gov. 86398 IN NS usc2.akam.net.
dhs.gov. 86398 IN NS usw3.akam.net.
dhs.gov. 86398 IN NS asia3.akam.net.

;; Query time: 2406 msec
;; SERVER: ::1#53(::1)
;; WHEN: Wed Mar 05 17:43:57 UTC 2014
;; MSG SIZE rcvd: 223

real 0m2.519s
user 0m0.009s
sys 0m0.103s

First one is against an unbound instance, second is against a BIND instance.

Perhaps one of the authoritative NSes are slow to respond for whatever
reason?

These are the values we got on the second (faster) Unbound instance. On the primary we have > 4 seconds which lead to timeout on the downstream resolver, so the host is not accessible from time to time :frowning:
It would be nice to know if we have any chance to get this fixed by excluding some NS or pin something in the cache, because i doubt they will fix whatever might be on theri side.

Regards

Andreas

Hi,

Zitat von staticsafe <me@staticsafe.ca>:

Hello,

today we discovered a hostname which is very slow to resolv
with Unbound 1.4.21 as validating resolver. It works fine with
all knid of other resolvers and oddly enough even with another
Unbound instance. The host in question is esta.cbp.dhs.gov and
resolve time after it is not in the cache range from around 2
to 5 seconds. I have take a tcpdump and can only see that the
first answer come much faster but Unbound keeps asking for the
same A record on different nameservers again and again.

Any idea what is going wrong?

The zone is not signed, but it is hosted on the same servers that also
host its parent zone, which is signed. Unbound is searching for
dnssec information. Then it does not find it. Then it tries to build
a chain of trust and finds the nsec3optout and then you get the answer.

Apart from a lot of traffic to those servers, as it is trying all of
them for every query, it should actually work fairly fast. Are these
servers somehow blocking access to you (with timeouts) ?

Since the servers are all responsive (for me, from an IETF address),
and in total the resolution is very fast (not near 5-10 sec), I think
something else is going one. This could have been triggered by the
extra traffic that unbound sends towards those servers because it is
trying to find out the co-hosted-parent problem as well as an optout
that happens while it did not see the optout-referral.

Looking for workarounds, try domain-insecure for cbp.dhs.gov.

Best regards,
   Wouter

Zitat von Wouter Wijngaards <wouter@nlnetlabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Zitat von staticsafe <me@staticsafe.ca>:

Hello,

today we discovered a hostname which is very slow to resolv
with Unbound 1.4.21 as validating resolver. It works fine with
all knid of other resolvers and oddly enough even with another
Unbound instance. The host in question is esta.cbp.dhs.gov and
resolve time after it is not in the cache range from around 2
to 5 seconds. I have take a tcpdump and can only see that the
first answer come much faster but Unbound keeps asking for the
same A record on different nameservers again and again.

Any idea what is going wrong?

The zone is not signed, but it is hosted on the same servers that also
host its parent zone, which is signed. Unbound is searching for
dnssec information. Then it does not find it. Then it tries to build
a chain of trust and finds the nsec3optout and then you get the answer.

Apart from a lot of traffic to those servers, as it is trying all of
them for every query, it should actually work fairly fast. Are these
servers somehow blocking access to you (with timeouts) ?

Since the servers are all responsive (for me, from an IETF address),
and in total the resolution is very fast (not near 5-10 sec), I think
something else is going one. This could have been triggered by the
extra traffic that unbound sends towards those servers because it is
trying to find out the co-hosted-parent problem as well as an optout
that happens while it did not see the optout-referral.

Looking for workarounds, try domain-insecure for cbp.dhs.gov.

Best regards,
   Wouter

As of now they are much faster but still slow:

Main site:
; <<>> DiG 9.8.1-P1 <<>> esta.cbp.dhs.gov A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22953
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;esta.cbp.dhs.gov. IN A

;; ANSWER SECTION:
esta.cbp.dhs.gov. 900 IN A 216.81.87.20

;; Query time: 1503 msec
;; SERVER: 10.5.0.3#53(10.5.0.3)
;; WHEN: Thu Mar 6 21:18:24 2014
;; MSG SIZE rcvd: 50

was more that 4000ms at report time

Hosted VPS:
; <<>> DiG 9.8.1-P1 <<>> esta.cbp.dhs.gov A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46389
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;esta.cbp.dhs.gov. IN A

;; ANSWER SECTION:
esta.cbp.dhs.gov. 900 IN A 216.81.87.20

;; Query time: 387 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Mar 6 21:19:04 2014
;; MSG SIZE rcvd: 50

was around 1500ms at report time

Pfoblem is that we have a cascade with unbound1 asking unbound2 at the gateway and with resolve-time of around 4 seconds unbound1 will report timeout and access to this site will block. I could capture a tcpdump and send you in private if you like to have a look at it. I'm a little out of ideas because as said it looks like the answer is flowing in fast but unbound is searching over and over again...

Thanks

Andreas