Spikes on recursive resolution average time in Unbound

Hi

I use Unbound in resolve mode. At the bottom you can see my unbound.conf

I have a grafana/influxdb collecting and showing Unbound statistics.

I live in Brazil so I know that the root DNS servers should be far from here. I can see the recursive average time is high at the start of the service and it goes down as days go by. It goes down to close to 130ms. Suddenly, there is a high latency and it goes to 260ms or more on average.

The latency for 8.8.8.8 that I use as a test ip on the gateway setup is pretty much stable at 6ms with 0 loss. Which makes me understand that it’s not a WAN/Net problem.

What can cause this behavior in Unbound ? Is it avoidable ? How ? Is there a way to see which query caused the spike in the time response ? I had times of 9 seconds during last week, without a perceived outage in the WAN.

Thanks for your help

Below my conf file

Hi

I use Unbound version 1.12 at OPNSense in resolve mode. At the bottom you can see my unbound.conf

I have a grafana/influxdb collecting and showing Unbound statistics.

I live in Brazil so I know that the root DNS servers should be far from here. I can see the recursive average time is high at the start of the service and it goes down as days go by. It goes down to close to 130ms. Suddenly, there is a high latency and it goes to 260ms or more on average.

The latency for 8.8.8.8 that I use as a test ip on the gateway setup is pretty much stable at 6ms with 0 loss. Which makes me understand that it’s not a WAN/Net problem.

What can cause this behavior in Unbound ? Is it avoidable ? How ? Is there a way to see which query caused the spike in the time response ? I had times of 9 seconds during last week, without a perceived outage in the WAN.

Thanks for your help

Below my conf file

I use Unbound version 1.12 at OPNSense in resolve mode. At the
bottom you can see my unbound.conf

I have a grafana/influxdb collecting and showing Unbound statistics.

I live in Brazil so I know that the root DNS servers should be
far from here.

Really? I beleive there exists RIPE Atlas measurements which
indicate that there are serveral root DNS name server instances
present in Brazil.

Please note that while the number of IP addresses corresponding
to the root name servers are 13 IPv4 addresses (and, I beleive,
also 13 IPv6 addresses), the DNS root name service is actually
heavily anycasted with a multitude of geographically distributed
instances per letter / IP address.

I can see the recursive average time is high at the start of
the service and it goes down as days go by. It goes down to
close to 130ms. Suddenly, there is a high latency and it goes
to 260ms or more on average.

The latency for 8.8.8.8 that I use as a test ip on the gateway
setup is pretty much stable at 6ms with 0 loss. Which makes me
understand that it's not a WAN/Net problem.

I suspect what you are comparing is the difference between a
"cold cache" and a "hot cache".

With a "cold cache", to resolve a given query, your recursor will
in all probability have to do queries to multiple far-away name
servers to resolve the original query. This will show as a
higher time to resolve the query.

On the other hand, if the answer is (still) present in your
cache, it can be served directly without incurring the cost of
the cache-filling queries to multiple far-away publishing name
servers.

What can cause this behavior in Unbound? Is it avoidable? How?
Is there a way to see which query caused the spike in the time
response? I had times of 9 seconds during last week, without a
perceived outage in the WAN.

I agree, 9 seconds is excessive, but without any further data it
is difficult to tell what is causing that. Sometimes domain
owners are careless with keeping all their publishing name
servers in working order (causing reucrsors to have to re-try the
query to another publishing name server), and/or they are sloppy
with keeping their delegation records with the parent domain up
to date, and sometimes this sort of problems can cause prolonged
lookup times for recursors.

Regards,

- Håvard

I live in Brazil so I know that the root DNS servers should be
far from here.

Really? I beleive there exists RIPE Atlas measurements which
indicate that there are serveral root DNS name server instances
present in Brazil.

The zoomable map at https://root-servers.org/ indicates that there
are 31 instances of root DNS servers in Brazil, with 5 instances
in Rio de Janeiro alone.

Steinar Haug, AS2116

Guys,

I appreciate the lesson. I didn’t know we had root servers locally in Brazil. So, again thanks for the information.

But, with this in mind, why would I get a cold search (not on the cache) going for over 300ms in average ? Is there anything in my conf file wrong ?

Is my result what you see on your end?

I’m new at this and trying to learn

Thanks

This is not a limit; cold search from EMEA for sites in USA often gives 1500-5000 ms )))) This is ok, on my side I’ve just increased TTL value for cached data.