Dig fails intermittently, but unbound-host does not

Hello,

We have version 1.3.4 on a server and have an odd, intermittent, problem
with looking up a particular record.

We have other unbound and bind servers that don't have this problem.

eg:

[root@a log]# unbound-control flush farnell.com
ok
[root@a log]# dig uk.farnell.com @localhost

; <<>> DiG 9.3.6-P1-RedHat-9.3.6-4.P1.el5 <<>> uk.farnell.com @localhost
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 60335
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;uk.farnell.com. IN A

;; Query time: 73 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Mar 29 10:20:56 2011
;; MSG SIZE rcvd: 32

[root@a log]# unbound-host uk.farnell.com -v
uk.farnell.com has address 83.100.177.198 (insecure)
uk.farnell.com has no IPv6 address (insecure)
uk.farnell.com mail is handled by 3 mailcore.theplanet.net. (insecure)
uk.farnell.com mail is handled by 2 mail.uk.farnell.com. (insecure)

Sometimes the dig uk.farnell.com @localhost works, others it does not.

It seems to only be this domain that we have problems with.

I suspect there are better ways to diagnose this, or to log more
verbose? Have you any suggestions as to what steps I should be taking
please?

The server, if it helps, is 90.155.53.31. I'd happily upgrade, but would
rather find out why this is happening first.

Thanks.

Andrew.

That domain seems broken, at least from the "world view":

[paul@bofh ~]$ dnscheck uk.farnell.com.
   0.000: uk.farnell.com. INFO Begin testing zone uk.farnell.com. with version 1.2.1.
   0.000: uk.farnell.com. INFO Begin testing delegation for uk.farnell.com..
   6.008: uk.farnell.com. INFO Name servers listed at parent: dns1.cscdns.net,dns2.cscdns.net
   6.168: uk.farnell.com. ERROR Failed to find name servers of uk.farnell.com./IN.
   6.168: uk.farnell.com. ERROR No name servers found at child.
   6.168: uk.farnell.com. INFO Done testing delegation for uk.farnell.com..
   6.168: uk.farnell.com. CRITICAL Fatal error in delegation for zone uk.farnell.com..
   6.168: uk.farnell.com. INFO Test completed for zone uk.farnell.com..

If it works internally, perhaps one issue is that one of your servers uses the external instead
of internal view?

Paul

Thanks for the info, but I'm not sure this explains it, as:
  unbound-host uk.farnell.com -v
always works, and gives answers, but
  dig uk.farnell.com @localhost
is intermittent

Also, http://www.squish.net/dnscheck works each time we try

Hi Andrew, Paul,

We have version 1.3.4 on a server and have an odd, intermittent, problem
with looking up a particular record.

We have other unbound and bind servers that don't have this problem.

eg:

[root@a log]# unbound-control flush farnell.com
ok
[root@a log]# dig uk.farnell.com @localhost

That domain seems broken, at least from the "world view":

[paul@bofh ~]$ dnscheck uk.farnell.com.
  0.000: uk.farnell.com. INFO Begin testing zone uk.farnell.com. with
version 1.2.1.
  0.000: uk.farnell.com. INFO Begin testing delegation for uk.farnell.com..
  6.008: uk.farnell.com. INFO Name servers listed at parent:
dns1.cscdns.net,dns2.cscdns.net
  6.168: uk.farnell.com. ERROR Failed to find name servers of
uk.farnell.com./IN.
  6.168: uk.farnell.com. ERROR No name servers found at child.
  6.168: uk.farnell.com. INFO Done testing delegation for uk.farnell.com..
  6.168: uk.farnell.com. CRITICAL Fatal error in delegation for zone
uk.farnell.com..
  6.168: uk.farnell.com. INFO Test completed for zone uk.farnell.com..

If it works internally, perhaps one issue is that one of your servers
uses the external instead
of internal view?

I think Paul is correct.

Thanks for the info, but I'm not sure this explains it, as:
  unbound-host uk.farnell.com -v
always works, and gives answers, but
  dig uk.farnell.com @localhost
is intermittent

Also, http://www.squish.net/dnscheck works each time we try

That is because the first looking (has to) use the parent-side
delegation information. But with a cache the daemon on a second lookup
uses the child-side delegation information. unbound-host is a
commandline tool and does the first lookup of course.

In unbound 1.4.5 the approach to deal with such broken domains was
changed significantly, making it more robust. It may work with this
broken domain.

Or, you could unbreak the domain, fix it :slight_smile:

Best regards,
   Wouter

Zitat von Paul Wouters <paul@xelerance.com>:

We have version 1.3.4 on a server and have an odd, intermittent, problem
with looking up a particular record.

We have other unbound and bind servers that don't have this problem.

eg:

[root@a log]# unbound-control flush farnell.com
ok
[root@a log]# dig uk.farnell.com @localhost

That domain seems broken, at least from the "world view":

[paul@bofh ~]$ dnscheck uk.farnell.com.
  0.000: uk.farnell.com. INFO Begin testing zone uk.farnell.com. with version 1.2.1.
  0.000: uk.farnell.com. INFO Begin testing delegation for uk.farnell.com..
  6.008: uk.farnell.com. INFO Name servers listed at parent: dns1.cscdns.net,dns2.cscdns.net
  6.168: uk.farnell.com. ERROR Failed to find name servers of uk.farnell.com./IN.
  6.168: uk.farnell.com. ERROR No name servers found at child.
  6.168: uk.farnell.com. INFO Done testing delegation for uk.farnell.com..
  6.168: uk.farnell.com. CRITICAL Fatal error in delegation for zone uk.farnell.com..
  6.168: uk.farnell.com. INFO Test completed for zone uk.farnell.com..

If it works internally, perhaps one issue is that one of your servers uses the external instead
of internal view?

Works fine here:

; <<>> DiG 9.4.2-P2.1 <<>> @127.0.0.1 uk.farnell.com A
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53139
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;uk.farnell.com. IN A

;; ANSWER SECTION:
uk.farnell.com. 289 IN A 83.100.177.198

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Mar 29 14:18:35 2011
;; MSG SIZE rcvd: 48

Any nothing obvious wrong?

; <<>> DiG 9.4.2-P2.1 <<>> @c.gtld-servers.net farnell.com ns
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63869
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;farnell.com. IN NS

;; AUTHORITY SECTION:
farnell.com. 172800 IN NS dns1.cscdns.net.
farnell.com. 172800 IN NS dns2.cscdns.net.

;; ADDITIONAL SECTION:
dns1.cscdns.net. 172800 IN A 165.160.12.20
dns2.cscdns.net. 172800 IN A 165.160.14.20

;; Query time: 104 msec
;; SERVER: 192.26.92.30#53(192.26.92.30)
;; WHEN: Tue Mar 29 14:17:12 2011
;; MSG SIZE rcvd: 109

; <<>> DiG 9.4.2-P2.1 <<>> @165.160.14.20 uk.farnell.com A +norecurse
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32645
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;uk.farnell.com. IN A

;; ANSWER SECTION:
uk.farnell.com. 300 IN A 83.100.177.198

;; Query time: 210 msec
;; SERVER: 165.160.14.20#53(165.160.14.20)
;; WHEN: Tue Mar 29 14:19:54 2011
;; MSG SIZE rcvd: 48

; <<>> DiG 9.4.2-P2.1 <<>> @165.160.12.20 uk.farnell.com A +norecurse
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7073
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;uk.farnell.com. IN A

;; ANSWER SECTION:
uk.farnell.com. 300 IN A 83.100.177.198

;; Query time: 98 msec
;; SERVER: 165.160.12.20#53(165.160.12.20)
;; WHEN: Tue Mar 29 14:19:58 2011
;; MSG SIZE rcvd: 48

Regards

Andreas

Thanks for the info Wouter.

The domain is outside our control, but I'll upgrade our Unbound.

Thanks again

Oops. There is no zone cut for uk.farnell.com. The tool I used didn't properly
fail. You're right, it does work consistently when querying the farnell.com
name servers.

Paul