Unbound performance tuning

Hi.

I'm using unbound for a couple of months as a default recursor in one
middle level installation and I like it so far :slight_smile:
Now I have a paln to replace BIND with unbound in one big ISP network
(average recursion load is 50K requests/sec).
I have a question about how to tune unbound to allow him handle that
volumes. That parameters should I adjust to get maximum performance?
We have two CPU socket quard core Xeon server with 12GB of RAM running
FreeBSD 7 STABLE. Unbound will be compilled without libevent and
without threading, and we will start 8 unbound processes (num-threads:
8) using 1024MB cache size for each.

I have a couple of ideas about that should be tuned, but I would like
to ask - am I right, or I'm missing somthing? This post is request for
comments.

AFAIK unbound handle incoming requests this way:

................................................................|.rrset-cache-size:.|.......................................................
(Client).==>.|.num-queries-per-thread:.|.==>.|.msg-cache-size:.|.==>.|.outgoing-range:.|.==>.(Resolving)

So, I should pay attention for that potential bottlenecks:
1) num-queries-per-thread: - "The number of queries that every thread
will service simultaneously". If one process will have more than this
nuber of simultaneousl requests then it will drop exceeded requests?

2) msg-cache-size: - After queue has arrived into unbound it will be
saved in msg-cache and will wait there until resolving process will be
free to handle it? If msg-cache has no free space then new arrived
queues will be dropped?

3) outgoing-range: - the nuber of random ports that resolving process
can simultaneously use for sending requests out of unbound. If this
nuber is low, then msg-cache will grow in size and that can produce
queue dropping?

So, if that is right, I should set num-queries-per-thread: to 10240,
msg-cache-size: to 512MB and outgoing-range: to somthing close to 1024
(1024 - the limit of unbound's builtinmini-event event handler because
I don't use libevent). I will tune FreeBSD sysctl limits for this
(default FreeBSD 7 ulimit -n is 12328 and ulimit -d is 512M).

I assume this unbound.conf will be ok?

server:
        verbosity: 0
        num-threads: 8
        interface: 0.0.0.0
        port: 53
        outgoing-range: 980
        msg-cache-size: 512m
        msg-cache-slabs: 8
        num-queries-per-thread: 10240
        rrset-cache-size: 1024m
        rrset-cache-slabs: 8
        cache-max-ttl: 86400
        infra-host-ttl: 900
        infra-lame-ttl: 900
        infra-cache-slabs: 4
        infra-cache-numhosts: 10000
        infra-cache-lame-size: 10k
        do-ip4: yes
        do-ip6: no
        do-udp: yes
        do-tcp: yes
        do-daemonize: yes

        access-control: 0.0.0.0/0 allow
        access-control: ::0/0 allow

        chroot: "/usr/local/etc/unbound"
        username: "unbound"
        directory: "/usr/local/etc/unbound"
        logfile: ""
        use-syslog: no
        pidfile: "/usr/local/etc/unbound/unbound.pid"
        root-hints: "/usr/local/etc/unbound/named.cache"

        hide-identity: yes
        hide-version: yes
        harden-glue: yes
        module-config: "iterator"

Thanks in advance for any comment.

Hi,

When not specifying an interface line, or when specifically specifying
127.0.0.1 and ::1, unbound should only bind to the localhost ip addresses.

This works fine, unless interface-automatic: yes is set. Then it suddenly
starts to bind on ANY.

This is somewhat problematic when shipping good defaults for distributions,
where you'd want to default on listening only only localhost, but you also
want to support multiple interfaces (once configured by the user) to use
the proper source IP on replies.

Paul

Hi Beastie,

Beastie wrote:

Hi.

I'm using unbound for a couple of months as a default recursor in one
middle level installation and I like it so far :slight_smile:

Thanks

Now I have a paln to replace BIND with unbound in one big ISP network
(average recursion load is 50K requests/sec).
I have a question about how to tune unbound to allow him handle that
volumes. That parameters should I adjust to get maximum performance?
We have two CPU socket quard core Xeon server with 12GB of RAM running
FreeBSD 7 STABLE. Unbound will be compilled without libevent and
without threading, and we will start 8 unbound processes (num-threads:
8) using 1024MB cache size for each.

I have a couple of ideas about that should be tuned, but I would like
to ask - am I right, or I'm missing somthing? This post is request for
comments.

I'll try to tell you what I think I know.

AFAIK unbound handle incoming requests this way:

.............................................|.rrset-cache-size:.|...................................
(Client).==>.|.num-queries-per-thread:.|.==>.|.msg-cache-size:.|.==>.|.outgoing-range:.|.==>.(Resolving)

No, client queries are not stored in the msg-cache. The msg-cache
contains server replies.

So, I should pay attention for that potential bottlenecks:
1) num-queries-per-thread: - "The number of queries that every thread
will service simultaneously". If one process will have more than this
nuber of simultaneousl requests then it will drop exceeded requests?

It will still try to answer from cache, but if that fails the message
can be dropped.

A high number of num-queries-per-thread is useful to suck-up 'peak
load'. On the average, useful resolving of 'outgoing-range' number of
messages at the same time gives good performance I think.

2) msg-cache-size: - After queue has arrived into unbound it will be
saved in msg-cache and will wait there until resolving process will be
free to handle it? If msg-cache has no free space then new arrived
queues will be dropped?

No. It does not work like that. The cache is used to store DNS
information - both resource records and message formats.

The msg-cache (and rrset-cache) stores DNS info, using MRU algo, if a
new entry arrives, the LRU elements are deleted until there is
sufficient space for it.

Thus, the caches are not part of your flow line.

3) outgoing-range: - the nuber of random ports that resolving process
can simultaneously use for sending requests out of unbound. If this
nuber is low, then msg-cache will grow in size and that can produce
queue dropping?

Right and wrong. Right about that it is the number of ports that the
resolving process can simultaneously use for sending requests out of
unbound. And if low, then it produces problems with queing.

(Wrong about msg-cache growing in size). What grows in size is probably
the requestlist. The queries that wait for resolving (i.e. they were
not answered from cache) are in the requestlist. This is an abstraction
- - it is not a simple list - but it is easy to understand. The
requestlist will grow in size if the outgoing range is too small.

The num-queries-per-thread is the maximum value for the requestlist.

The statistics printout includes the requestlist size; the average and
the maximum. It also includes the number of times the requestlist was
exceeded. This can give you a good indication what the size of your
requestlist is in practice, and if its too small.

For good performance, I would think outgoing-range and
num-queries-per-thread should be about equal and large enough ...

So, if that is right, I should set num-queries-per-thread: to 10240,
msg-cache-size: to 512MB and outgoing-range: to somthing close to 1024
(1024 - the limit of unbound's builtinmini-event event handler because
I don't use libevent). I will tune FreeBSD sysctl limits for this
(default FreeBSD 7 ulimit -n is 12328 and ulimit -d is 512M).

It would be good to compile with libevent - as it removes the 1024
restriction on the outgoing-range - and you need lots. AFAIK there are
no problems on FreeBSD 7 with libevent(-kqueue backend).

Then I would put 6400 as outgoing-range. For peak performance (8*6
about 48000 queries at the same time (that miss the cache) for your 50k
qps server).

I assume this unbound.conf will be ok?

Small problem, you allocate 512m msg and 1024m rrset cache per thread,
but that adds up to 12G. Exactly your max. I would leave some spare
room - for other processes, other data - so, 256m and 512m.
(note for the reader: because he compiled without threading, unbound is
forking and uses the cache memory per-processor, instead of shared mem).

even though you put verbosity: 0, it will still print (fatal) errors.
But since you set logfile: "" (stderr), and use-syslog: no; the stderr
stream goes nowhere. Set use-syslog: yes to get errors printed in your
var/log/messages (depending on your log config).

If verbosity: 0 is printing too many log lines, let me know.

Best regards,
   Wouter

server:
        verbosity: 0
        num-threads: 8
        interface: 0.0.0.0
        port: 53
        outgoing-range: 980
        msg-cache-size: 512m
        msg-cache-slabs: 8
        num-queries-per-thread: 10240
        rrset-cache-size: 1024m
        rrset-cache-slabs: 8
        cache-max-ttl: 86400
        infra-host-ttl: 900
        infra-lame-ttl: 900
        infra-cache-slabs: 4
        infra-cache-numhosts: 10000
        infra-cache-lame-size: 10k
        do-ip4: yes
        do-ip6: no
        do-udp: yes
        do-tcp: yes
        do-daemonize: yes

        access-control: 0.0.0.0/0 allow

        access-control: ::0/0 allow

This line does nothing since do-ip6: is "no".

Hi Paul,

The option overrides the interfaces specified by the user. The socket
options that it is using need to use the ANY interface. It even
detects newly added network interfaces real-time and services them.

Also, the socket options may require IPv6. I am not sure to what
extent. Because of that, it may not be such a nice default-option...

Really, it is mostly useful for anycast (load-balancing solutions).

Is what you need an new option that I would call;
set-interface-automatic-if-ANY-is-specified-as-interface ?

Best regards,
   Wouter

Paul Wouters wrote:

The option overrides the interfaces specified by the user. The socket
options that it is using need to use the ANY interface. It even
detects newly added network interfaces real-time and services them.

I thought so...

Really, it is mostly useful for anycast (load-balancing solutions).

I have to see what I thought I needed it again. I think I ran into issues
when I was just using loopback and one public IP on a dual homed machine.

Is what you need an new option that I would call;
set-interface-automatic-if-ANY-is-specified-as-interface ?

Exactly.

Paul