'SERVFAIL' reply from forwarder leads to query storm?

Hi,

We're running unbound 1.4.17 under FreeBSD 9. Unbound is setup as a simple 'forwarder' to our BIND 9 recursive servers, i.e.

"
forward-zone:
        name: "."
        forward-addr: 1.1.1.1
        forward-addr: 2.2.2.2
        forward-first: yes
"

This works, but appears to have issues for 'malformed' / invalid domains.

For example - if the client goes to query "MX for hayoo.com" (probably a typo of 'yahoo.com') - we see Unbound forward it to the first forwarder, which is running BIND - that server logs:

"
26-Jul-2012 14:16:34.864 resolver: notice: DNS format error from 206.188.198.53#53 resolving hayoo.com/MX for client 3.3.3.3#7582: invalid response
"

This results in BIND returning 'SERVFAIL'. Fair enough.

Unbound then tries the second forwarder - which again logs 'invalid response' - and sends back SERVFAIL.

At this point - the original client request returns with 'SERVFAIL' (as you'd kind of hope / expect).

However - 'in background' Unbound keeps trying each forwarder in turn at great lengths, trying to get the query resolved e.g. running the above -single lookup looking at the log output for unbound gives:

"
Jul 26 14:38:46 host unbound: [85689:0] info: response for hayoo.com. MX IN
Jul 26 14:38:46 host unbound: [85689:0] info: reply from <.> 1.1.1.1#53
Jul 26 14:38:46 host unbound: [85689:0] info: query response was THROWAWAY
Jul 26 14:38:46 host unbound: [85689:0] info: response for hayoo.com. MX IN
Jul 26 14:38:46 host unbound: [85689:0] info: reply from <.> 2.2.2.2#53
Jul 26 14:38:46 host unbound: [85689:0] info: query response was THROWAWAY
Jul 26 14:38:46 host unbound: [85689:0] info: response for hayoo.com. MX IN
Jul 26 14:38:46 host unbound: [85689:0] info: reply from <.> 1.1.1.1#53
Jul 26 14:38:46 host unbound: [85689:0] info: query response was THROWAWAY
...
"

This goes on for many hundreds of lines (for the example above a single 'dig MX hayoo.com' on the host resulted in nearly 1000 lines logged - nearly all were logged after the initial dig returned indicating 'SERVFAIL').

Is there any way to stop Unbound from "rattling" around the forwarders over and over at great speed in this situation?

Thanks,

-Karl

Hi Karl,

I'm fairly new to this list but I have one observation that may be helpful.

The proper repsponse for malformed/invalid domains isn't SERVFAIL but "NXDOMAIN" -- which Unbound will happily pass along if encountered. Unbound is just doing its job trying to resolve the query and get a proper yes/no response from your servers.

Hi Karl,

Format error? The 206.188.198.53 server is a public server and it
generates format errors, dig works, but dig +dnssec produces some sort
of fake upward delegation to appear lame.

The BIND server responds with servfail. Unbound tries the servers
before going to the authorities, it exhausts all the options before it
goes to the authorities. Thus it elicits servfail (several times)
from all of the available servers before it moves to the fallback
authority option. This is likely what happens here?

Found a bug in forward-first setting of the RD flag, though (and fixed
it).

Best regards,
   Wouter