Does anybody have some additional performance tuning tips for Unbound, specificityon Solaris 11?
I have followed the recommended settings in the "HowTo Optimise", but still seem to hit a ceiling of +/- 3600 max queries a second.On top of that the platform/OS do become a bit sluggish when logging in via SSH.
Does anybody have some additional performance tuning tips for Unbound, specificityon Solaris 11?
I have followed the recommended settings in the "HowTo Optimise", but still seem to hit a ceiling of +/- 3600 max queries a second.On top of that the platform/OS do become a bit sluggish when logging in via SSH.
Which part of the "HowTo Optimise" do you use and how?
What is the load on your system as of CPU power used?
Have you checked the upstream connection or forwarders used?
Do you use DNSSEC? What kind of data do you use for testing?
Without this no one will be able to tell why your system max out at 3600qps.
Does anybody have some additional performance tuning tips for
Unbound, specificityon Solaris 11?
I have followed the recommended settings in the "HowTo Optimise",
but still seem to hit a ceiling of +/- 3600 max queries a
second.On top of that the platform/OS do become a bit sluggish
when logging in via SSH.
Which part of the "HowTo Optimise" do you use and how? What is the
load on your system as of CPU power used? Have you checked the
upstream connection or forwarders used? Do you use DNSSEC? What
kind of data do you use for testing?
Without this no one will be able to tell why your system max out
at 3600qps.
The sparc T3 has special hardware threads. Unbound has an option to
use the solaris thread library, configure --with-solaris-threads and
perhaps with the sun compiler (CC=/opt/.../cc). If the hardware
threads really work, then with num-threads up to 64 (or your number of
hardware support), you could, potentially (this has not been tried)
get up to 64x your previous performance.
Zitat von "W.C.A. Wijngaards" <wouter@nlnetlabs.nl>:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Jaco, Andreas,
Zitat von Jaco Lesch <jacol@saix.net>:
Does anybody have some additional performance tuning tips for
Unbound, specificityon Solaris 11?
I have followed the recommended settings in the "HowTo Optimise",
but still seem to hit a ceiling of +/- 3600 max queries a
second.On top of that the platform/OS do become a bit sluggish
when logging in via SSH.
Which part of the "HowTo Optimise" do you use and how? What is the
load on your system as of CPU power used? Have you checked the
upstream connection or forwarders used? Do you use DNSSEC? What
kind of data do you use for testing?
Without this no one will be able to tell why your system max out
at 3600qps.
The sparc T3 has special hardware threads. Unbound has an option to
use the solaris thread library, configure --with-solaris-threads and
perhaps with the sun compiler (CC=/opt/.../cc). If the hardware
threads really work, then with num-threads up to 64 (or your number of
hardware support), you could, potentially (this has not been tried)
get up to 64x your previous performance.
Best regards,
Wouter
Are the "software" (POSIX??) threads that slow with the T3 processor?
The portion of unbound.conf pertaining to the "HowTo Optimise":
# Optimise settings
num-threads: 112
#
msg-cache-slabs: 64
rrset-cache-slabs: 64
infra-cache-slabs: 64
key-cache-slabs: 64
rrset-cache-size: 1024m
msg-cache-size: 512m
infra-cache-numhosts: 100000
#
# Larger socket buffer
# For Solaris 11 set the following UDP parameter 1st:
# 'ipadm set-prop -p max_buf=8388608 udp'
so-rcvbuf: 8m
so-sndbuf: 8m
#
outgoing-range: 8192
num-queries-per-thread: 4096
Hmm, i would lower "num-queries-per-thread" or leave default and raise the "outgoing-range" if the OS permits. From my knowledge the outgoing-range limit the max. number of open sockets for upstream queries and should be roughly num-threads * num-queries-per-thread, no?
You could also try to lower the num-threads, maybe you have already missed the sweet-point with best balance between concurrency and max out the hardware.
This is likely to work against your goal, as thread management / query
distribution will start using more cpu than necessary. I'd rather tune
it down to a reasonable number that can still comfortably serve all
requests.
I did the same sort of optimization as you, and set threads to 16.
While it worked fine, I later switched to 4 threads after figuring out
a bit more about threads and how things work in the OS:
It may not apply directly to your Solaris setup, but it's worth
investigating.
My resolvers consistently serve 30k-40k+ reqs/sec (some threads are up
to around 12k reqs/sec each, due to uneven balance), with load around
0.6 and CPU use around 2.0--10.0% sys and ~2.0% user, depending on the
hardware it runs on (Intel Xeon L5520 or L5640, 2x socket, quad/hex
cores with hyperthreading). I run them on Linux 3.2.
With the info/ideas from Andreas I have amended the parameters in the unbound.conf as follows:
# Optimise settings
num-threads: 64
#
msg-cache-slabs: 64
rrset-cache-slabs: 64
infra-cache-slabs: 64
key-cache-slabs: 64
rrset-cache-size: 1024m
msg-cache-size: 512m
infra-cache-numhosts: 100000
#
# Larger socket buffer
# For Solaris 11 set the following UDP parameter 1st:
# 'ipadm set-prop -p max_buf=8388608 udp'
so-rcvbuf: 8m
so-sndbuf: 8m
#
outgoing-range: 32768
num-queries-per-thread: 1024
This improved performance significantly with a jump from 3600 qps to 6200 qps and the system was not sluggish with this "workload" at all. The magic seems to have been the combination of "outgoing-range" and "num-queries-per-thread", the system load did drop from 28 to around 21 directly after this change.
I also did recompile with Wouter's suggestion of using only Solaris threads, but the gain was very little if any. Again this is something to keep in mind for the future? Also stayed with the 64 threads for now, but at a later stage will test again at a full 128 threads, when I feel brave again.
Hello
as said i doubt you will gain much with a high (>>4) number of threads. Threads are used to prevent resovler stall if all slots (num-queries-per-thread) are busy (data on the fly). With 64 threads you have the potential of 64k not yet answered queries hammering your upstream. I would suggest you start with some really low value like 4 threads and see if raising this value will raise the qps tested.
This improved performance significantly with a jump from 3600 qps to
6200 qps and the system was not sluggish with this "workload" at
all. The magic seems to have been the combination of
"outgoing-range" and "num-queries-per-thread", the system load did
drop from 28 to around 21 directly after this change.
can you clarify, is the workload a "production" resolver (so, a real
user base that is perhaps really only offering ~6k qps of load), or is
this a "benchmark" where the goal is to determine the maximum number of
qps that the server can handle?
either way, a load average of 21-28 sounds way too high. perhaps
solaris has some sort of tool (dtrace?) comparable to the linux "perf"
tool that would help pinpoint the cause of the poor performance?