Hi,
I'm considering switching from bind to unbound, and have been testing it
on one of our recursive dns servers. Our servers are KVM virtual machines,
running RHEL6.5, 12GB memory, 8 cpu cores, with unbound-1.4.21-1.el6.x86_64
from EPEL.
We typically have high time periods of 12-14k qps on our DNS-servers, and
it's been working fairly well on bind, but with unbound we seem to get into
trouble when the qps exceed 7k. We then see a clear drop in the request
rate, and clients move over to the secondary dns server.
The only problem we've noticed on the unbound server is that context
switches/s is very high. pidstat for unbound doesn't report high
cswch/s, but the system does..
Here's "sar -w" from yesterday evening running unbound:
kl. 19.00 +0200 proc/s cswch/s
kl. 19.10 +0200 3,70 116480,30
kl. 19.20 +0200 3,47 123118,48
kl. 19.30 +0200 3,67 128948,60
kl. 19.40 +0200 3,45 125471,32
kl. 19.50 +0200 3,69 132641,76
kl. 20.00 +0200 3,48 140126,75
while the day before on bind:
kl. 19.00 +0200 1,90 64801,51
kl. 19.10 +0200 2,15 64550,78
kl. 19.20 +0200 1,94 64389,23
kl. 19.30 +0200 2,19 64369,56
kl. 19.40 +0200 1,92 64211,15
kl. 19.50 +0200 2,09 64087,84
kl. 20.00 +0200 1,91 63691,33
Any ideas for what we should try to improve this?
Full unbound.conf stripped for comments: