Hi Alexander,
Alexander Gall wrote:
Wouter,
Here's an update on my testing of 0.7.2 (I had the flu last week, back
to work now 
Thank you very much for the test. I'll examine your logs below.
Total run time so far is about 62 hours with a bit over 10 million
answered queries. I was confident enough to let unbound run for
almost two days just now, which should have been enough to test most
of the TTL and DNSSEC validation expiration logic I guess.
Good to know that validation was tested too; and works 
Everything else is left at the default values. Please let me know if
you want me to test anything non-default in particular.
I did not notice any problems at all and didn't get any reports from
our users neitherl. I count this as a success 
Great!
10 million real user queries is not something that I can try myself.
unbound also correctly dealt with the situation that it had a trust
anchor defined for example.net (from the set of trust anchors
distributed by RIPE NCC
<https://www.ripe.net/projects/disi/keys/ripe-ncc-dnssec-keys-new.txt>\),
but the corresponding DNSKEY is missing from the zone:
[1201660228] unbound[16620:0] info: failed to prime trust anchor --
could not fetch DNSKEY rrset <example.net. DNSKEY IN>
Ah, yes, in such a situation you can wait for example.net to fix their
zone and after 900 (default bogus ttl) unbound will pick that up, or
instead change your config and kill -HUP unbound (this also clears the
cache).
Operational experience: I was able to integrate unbound into our
anycast caching system without problems. This allows me to run BIND
and unbound in parallel on different anycast instances just as I had
planned to do. All of this is looking very good.
Oh this is really nice. Would be interesting to know of any noticable
differences between bind and unbound. Apart from version.bind CH TXT, of
course.
Your log file. Thank you for sharing the statistics.
* There are many TCP connect errors. I assume now, that someone
configured a zone as example.com NS 10.0.0.100, with a nameserver on
their local subnet. Unbound cannot contact that nameserver, and tries
(finally) to use TCP on it; which gives this error.
I think the log file should not be cluttered with zone administration
mistakes by others. I can demote this particular error to a higher
verbosity level (2), or I can print the address that failed and then you
(the operator) or a script can pickup those and block them
(do-not-query-address: 10.0.0.0/8 in the config file).
I think I'll demote the error message, as it does not bother the
resolver operator. What do you think? Would you like to have the
addresses printed to the log anyway?
* You have 93% cache hits. With the default 4+4 Mb cache (4 mb for rrset
data, 4 mb for message data), so unbound caps memory usage at about
20-30 Mb total for the process. For 10 million queries. This is
impressive. You could try to increase cache size to improve cache hits;
but it doesn't seem worth the effort.
* The requestlist (this is the to-do list of pending recursive queries)
stays nice and small as well. If the computer were unable to bear the
load, this number would shoot up as requests come in faster than they
could be handled.
* For the histogram, onlookers please note the average reply time
printed a) does not include the cache responses (which are better
measured in qps then seconds per query) b) is skewed because of really
large upper numbers caused by unbound retrying very hard for a couple of
records (remote server down). In a newer unbound the median is printed
as well, a nicer way to average recursion speed.
* There is a significant bump on the lower end of your histogram, at 32
microsec. I assume this is because a lot of recursion requests are due
to a CNAME. Like where a CNAME is used to load-balance with DNS.
Consequently, I need to pay attention to CNAME-processing when I do
optimization, good to know.
Thank you,
Best regards,
~ Wouter