flush_zone service impact

Hi,

We're running several unbound 1.6.0 (debian9 stable) instances for public service and we are experiencing service impactions while executing flush_zone action. As the instances have 32GB of memory and 4 skylake vCPUs, we beefed up the cache size parameters.
Current config, which leads up to overall of 22GB memory usage by the unbound process:
so-reuseport: yes
num-threads: 4
msg-cache-slabs: 4
rrset-cache-slabs: 4
infra-cache-slabs: 4
key-cache-slabs: 4
rrset-cache-size: 8000m
msg-cache-size: 4000m
neg-cache-size: 200m
key-cache-size: 500m
outgoing-range: 8192
num-queries-per-thread: 4096
so-rcvbuf: 8m
so-sndbuf: 8m
unwanted-reply-threshold: 10000000
infra-cache-numhosts: 500000

The 'unbound-control flush_zone some.domain' execution takes on busier instance 12..15 seconds. During flush, the unbound doesn't send out any DNS reply packets and there is silence on the wire until flush completes. The IO graphs in attach will demonstrate, how the source udp53 packet rate drops from 20Kpps to 0 for the flush execution time. Flush in a prime-time on a busy node easily results in a loss of approx 500K requests.
Unbound process consumes up all CPU resources (4vCPU's flat out), while in normal operation the CPU usage is avg 20..30%.

I gathered unbound-control status and stats outputs to separate file in attachment.
There is a massive burst of reply packets after the flush operation has been completed (see attached wireshark IO graphs). It could be related to kernel receive queue, which buffers queries while flush is in action.

Our main concern is the services unresponsiveness during flush. To reduce impaction time, we could decrease buffers size for example to following values, but it sure doesn't look like a solution to service impact.
rrset-cache-size: 640m
msg-cache-size: 320m
neg-cache-size: 32m
key-cache-size: 32m

Could there be any enhancements regarding cache handling and flush_zone in current 1.9 release compared to 1.6? In the changelog I couldn't find anything related to flush performance/impact.

Has anyone else had the same problem and could share some wisdom, how to get around it?
With larger cache size, theres also a bit more need for the flush to be usable.

kind regards,

(attachments)



unbound_data_while_flush.txt (13.6 KB)

Hi,

Rauno Tuul via Unbound-users <unbound-users@nlnetlabs.nl>:

msg-cache-slabs: 4
rrset-cache-slabs: 4
infra-cache-slabs: 4
key-cache-slabs: 4

Try large *-slabs value (e.g. 1024).

My guess is that long outage comes from lock contention between cache
flushing thread (thread0) and working threads (other than thread0). If
a slab is very big due to very large cache, thread0 holds spin lock of
the slab for long time during cache flush.
Large number of (small) slabs will reduce each slabs’ lock time by
thread0 and also reduces probability of lock contention. It may
introduce some memory overhead but would not be problematic since you
have much memory.

Regards,