We're running Unbound 1.4.18 on a number of FreeBSD machines now - and this generally, seems to be running well.
Initially we had an issue with our forwarders being 'overrun' for queries when domains were invalid - this was fixed by setting our "forward only" unbound.conf to use 'forward-first: no'
However, our BIND based forwarders (which unbound forwards onto) still see a large percentage of queries for domains, which they cannot resolve properly - and therefore return "invalid response", e.g.
"
15-Sep-2012 06:02:08.484 resolver: notice: DNS format error from 195.189.226.227#53 resolving iumdoctors.com/NS for client 192.168.0.2#5828: invalid response
"
Unbound running on 192.168.0.2 will keep asking for data about "iumdoctors.com" quite often, for quite a while. This may well be in response to software on that host, asking a lot for NS records for 'iumdoctors.com'.
Is there any setting in 1.4.18 that we can use to tell Unbound to cache the fact this query failed / gave an invalid response, so it can answer to clients for say the next 5 or 10 minutes from cache - without bothering the main forwarders?
This would dramatically cut the number of these queries being issued against our forwarders.
We did look at this before - but were more concerned with other issues (which as I said were resolved by setting 'forward-first: no') - now the system has been running a while, we can see that the query load on BIND has been reduced, but by caching this kind of lookup it'd drop even further.
We're running Unbound 1.4.18 on a number of FreeBSD machines now -
and this generally, seems to be running well.
Initially we had an issue with our forwarders being 'overrun' for
queries when domains were invalid - this was fixed by setting our
"forward only" unbound.conf to use 'forward-first: no'
Glad to hear that it works well now.
However, our BIND based forwarders (which unbound forwards onto)
still see a large percentage of queries for domains, which they
cannot resolve properly - and therefore return "invalid response",
e.g.
" 15-Sep-2012 06:02:08.484 resolver: notice: DNS format error from
195.189.226.227#53 resolving iumdoctors.com/NS for client
192.168.0.2#5828: invalid response "
Yes if unbound was to resolve this domain itself, it would also create
a failure (from a quick look).
Unbound running on 192.168.0.2 will keep asking for data about
"iumdoctors.com" quite often, for quite a while. This may well be
in response to software on that host, asking a lot for NS records
for 'iumdoctors.com'.
Is there any setting in 1.4.18 that we can use to tell Unbound to
cache the fact this query failed / gave an invalid response, so it
can answer to clients for say the next 5 or 10 minutes from cache -
without bothering the main forwarders?
There is no setting in the config file, but there is a constant in the
software code, in util/data/msgparse.h:78, NORR_TTL. You can change
this to a higher value and recompile if you want to store failed
queries for a longer time.
This would dramatically cut the number of these queries being
issued against our forwarders.
But, the problem with a large timeout here, and the reason for this
'fairly short but nonzero value' there is now, is that for many
queries, a retry may solve the situation. A large value here would
turn a temporary failure that would otherwise be unnoticed after it
works a minute later, into a longterm failure.
We did look at this before - but were more concerned with other
issues (which as I said were resolved by setting 'forward-first:
no') - now the system has been running a while, we can see that the
query load on BIND has been reduced, but by caching this kind of
lookup it'd drop even further.
Ok, that's is obviously a valid point - which we'll bear in mind. I think looking at our query load, we could get away with setting that to either 30s or 1 minute. We tend to find these queries for invalid domains arrive in 'blocks' - 30s or 1m would be long enough to ensure they all 'fail' from cache - but should be short enough that it doesn't mess up for sites that genuinely return an error for a 'short period' - but I do take your point on board.
tbh - Most the sites we see returning this kind of error look like typos, abandoned domains - or other 'nasties'.
I'll have a look at re-compiling with that adjustment, and see how we get on.