Unbound periodically stops providing valid lookups

Synopsis: having issues where unbound stops responding properly to lookups (doesn’t report error, just gives bad info) until I restart it.

Background:

I recently upgraded pfsense to 2.1 and switched to Unbound for the DNS resolver because I needed to do resolving directly instead of forwarding due to mail RBL service query overloading. Had no problem getting Unbound to work initially, but after a day I started getting a lot of malformed MX record lookups on my mail server and when I queried the records I was seeing a lot of null mx records, but doing a lookup on an external DNS service showed normal MX records. I disabled DNSSEC thinking it was related to that and the problem seemed to go away. However today the same problem started happening again and restarting the Unbound service has resolved. When the problem happens, Unbound reports bad info for the lookup… below is a lookup for navyfederal.org MX and notice is returns a null MX


>> dig @192.168.100.1 -t mx navyfederal.org.

; <<>> DiG 9.9.5-3-Ubuntu <<>> @192.168.100.1 -t mx navyfederal.org.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17827
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;navyfederal.org.               IN      MX

;; ANSWER SECTION:
navyfederal.org.        261     IN      MX      0 .

;; AUTHORITY SECTION:
org.                    22284   IN      NS      ns.buydomains.com.
org.                    22284   IN      NS      this-domain-for-sale.com.

;; Query time: 0 msec
;; SERVER: 192.168.100.1#53(192.168.100.1)
;; WHEN: Wed Sep 24 12:29:47 EDT 2014
;; MSG SIZE  rcvd: 125

Restarting Unbound and repeating now gives:


>> dig @192.168.100.1 -t mx navyfederal.org.

; <<>> DiG 9.9.5-3-Ubuntu <<>> @192.168.100.1 -t mx navyfederal.org.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14040
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;navyfederal.org.               IN      MX

;; ANSWER SECTION:
navyfederal.org.        300     IN      MX      10 navyfederal-org.mail.protection.outlook.com.

;; AUTHORITY SECTION:
navyfederal.org.        500     IN      NS      ns1.navyfedcu.org.
navyfederal.org.        500     IN      NS      ns.navyfedcu.org.
navyfederal.org.        500     IN      NS      ns1.navyfederal.org.

;; ADDITIONAL SECTION:
ns1.navyfederal.org.    500     IN      A       4.31.59.245

;; Query time: 41 msec
;; SERVER: 192.168.100.1#53(192.168.100.1)
;; WHEN: Wed Sep 24 12:35:48 EDT 2014
;; MSG SIZE  rcvd: 182

I’m not seeing anything obvious in the Unbound logs, so any help how to troubleshoot this is greatly appreciated.

Hi Derrick,

Synopsis: having issues where unbound stops responding properly to
lookups (doesn't report error, just gives bad info) until I restart
it.

Can you give your configuration, especially the stub and forward parts
that you may have? And if you have multiple, especially for "." and
"org.".

Background:

I recently upgraded pfsense to 2.1 and switched to Unbound for the
DNS resolver because I needed to do resolving directly instead of
forwarding due to mail RBL service query overloading. Had no
problem getting Unbound to work initially, but after a day I
started getting a lot of malformed MX record lookups on my mail
server and when I queried the records I was seeing a lot of null mx
records, but doing a lookup on an external DNS service showed
normal MX records. I disabled DNSSEC thinking it was related to
that and the problem *seemed* to go away. However today the same
problem started happening again and restarting the Unbound service
has resolved. When the problem happens, Unbound reports bad info
for the lookup... below is a lookup for navyfederal.org MX and
notice is returns a null MX

The difference is the org NS records. When it goes wrong the org NS
records are changed to the bug-this stuff. Unbound queries the fake
.org servers ran by this outfit and gets their (wildcarded) response
for every org domain.

The fix is that unbound should not pick up those NS records and
normally this happens with bailiwick filters and other scrub activity.
However, this has not happened now.

Please tell me your configuration in more detail, if you have private
servers involved (I mean not unbound but other servers on your
network), do they have weird configuration (eg. host the .org or root
zone) ?

Best regards,
   Wouter