How to measure cache hit resolution time in unbound 1.24.1

Hi

I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is 76% with over 20% coming through recursive replies.

The median time for recursive replies is 440ms while the avg is 520ms.

This setup has been running for over 72hrs. I expect stats to improve but that is not happening.

Just wanted to find out if there is a way to measure the cache hit resolution time in a dashboard?

Can I do anything to improve cache hit ratio?

Can I also improve the recursive reply time?

I am using unbound_exporter to monitor stats in grafana

My configs have been adjusted as follows:
rrset-cache-size: 20G
msg-cache-size: 10G
cache-min-ttl: 1800

I am using the root hint files directly on the server for recursive lookup and not forwarding to any public resolver

Thank you

Regards,
Isaac

Home many cores/slabs are you using?

Hi Seth

num-threads: 64
msg-cache-slabs: 32
rrset-cache-slabs: 32
infra-cache-slabs: 32
key-cache-slabs: 32
ratelimit-slabs: 32
ip-ratelimit-slabs: 32

The physical server is a dell 640 with specs below

hw.ncpu: 104
hw.model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz

Thank you
Isaac

Your thread should be equal to or lower than the number of slabs.

The thread count seems extremely high, you should not need so many. You should set num-queries-per-thread. Try 16384

Can you also paste your memory settings and Cache settings?

Hi Seth

The server is dedicated for this purpose hence the high number of threads

below configs are in place:

num-queries-per-thread: 4096
msg-cache-size: 10G
rrset-cache-size: 20G
key-cache-size: 1G

Thank you

I think you missed the point. See: https://nlnetlabs.nl/documentation/unbound/howto-optimise/

Set *-slabsto a power of 2 close to thenum-threadsvalue. Do this formsg-cache-slabs,rrset-cache-slabs,infra-cache-slabsandkey-cache-slabs. This reduces lock contention.

I service several hundred thousands of simultaneous clients with 10,000s queries per second on only 12 threads. Cache response time is less than 1ms, average response time is < 10ms. My hosts (I have 3 of them) have 16 threads/cores each, I leave 4 threads to do some server busy work like stats and logs collection. More threads doesn’t always mean better performance and in your case since your slab count is low you’re going to have a lot of lock contention.

Cheers

Thanks Seth.

I’ll make the adjustments and monitor performance.

Hi Seth

Do you have a tool that helps you measure the cache hit resolution time?

sir izake via Unbound-users <unbound-users@lists.nlnetlabs.nl>:

Just wanted to find out if there is a way to measure the cache hit resolution time in a dashboard?

Unbound has no facility to measure cache hit resolution time.

A method to measure cache hit time is to make query to names which always resolved with cache hit e.g. “dig -t NS . “; If the resolver is too busy and queries always remain stuck in its receive queue, its response (even for cache hit) would be delayed due to the queue dwell time, and queries may even be dropped.

Yes as Daisuke said, it’s a very unscientific approach to measure this. We use data from our load test rig plus some baseline network latency to arrive at estimates.

Our average also includes timeouts from some exotic domains and records that do not exist which probably originate from malware and all sorts of crap on our clients devices. It’s amazing the junk that people try to access.

How did your test go with the tuning already suggested, did you see any improvements?