"peering" unbound servers together

Graham_Beneke · April 19, 2013, 3:52pm

I've been trying to figure out for a while what potential optimizations
may be possible on DNS resolvers the have high latency (50+ ms) to the
typical locations of authoritative servers in the USA and EU.

I could cluster them together in a parent-child arrangement in order to
get the maximum sharing of cached answers. This does however introduce
an undesirable upstream point of failure.

I was then thinking that the following process may yield an improvement:

A query is received from a stub resolver for which an answer is not
immediately available from the local cache. The resolver first forwards
this query to a neighbor resolver (hoping for a cache hit) and then
directly after that (or delayed by ~10 ms) begins its own full recursion.

We end up with 2 (or more) resolvers all racing to get to the answer
first. Whichever answer (neighbor or authoritative) is returned to the
original server first is then cached and returned to the stub.

This does mean that neighbor resolvers are potentially both doing the
same recursion at the same time but I'm not too worried about this. It
has the side effect of filling both caches with a valid answer which I
consider a good thing. The primary objective is the fastest possible
responses to the stub resolvers.

I don't see any immediately obvious way to build a configuration that
will do this - have I missed something?

How difficult is it likely to be to build this capability into unbound?

Leen_Besselink · April 19, 2013, 6:52pm

I've been trying to figure out for a while what potential optimizations
may be possible on DNS resolvers the have high latency (50+ ms) to the
typical locations of authoritative servers in the USA and EU.

I could cluster them together in a parent-child arrangement in order to
get the maximum sharing of cached answers. This does however introduce
an undesirable upstream point of failure.

I was then thinking that the following process may yield an improvement:

A query is received from a stub resolver for which an answer is not
immediately available from the local cache. The resolver first forwards
this query to a neighbor resolver (hoping for a cache hit) and then
directly after that (or delayed by ~10 ms) begins its own full recursion.

We end up with 2 (or more) resolvers all racing to get to the answer
first. Whichever answer (neighbor or authoritative) is returned to the
original server first is then cached and returned to the stub.

This does mean that neighbor resolvers are potentially both doing the
same recursion at the same time but I'm not too worried about this. It
has the side effect of filling both caches with a valid answer which I
consider a good thing. The primary objective is the fastest possible
responses to the stub resolvers.

I don't see any immediately obvious way to build a configuration that
will do this - have I missed something?

I've also been thinking how certain things could be improved.

The reason for it was different, to reduce the number of queries being send to DNSBL
nameservers.

If you'd want to build something similar you'll probably have to do what, I
think, OpenDNS and Google Public DNS are doing.

Which is to create a shared cache.

I guess it is a bit like large websites use memcached.

This prevents having to send multiple of the same queries from different recursors
and in the ideal case you can replicate that cache information to different regions
as well.

What they also do to lower the latency of a request from a stub is to send a new
query for cached records which should be removed from the cache because the DNS TTL
is timing out.

If you do this for stubs in different regions you should probably also have an
implementation which supports EDNS client subnet.

How difficult is it likely to be to build this capability into unbound?

I do know that Bert from PowerDNS years ago added some memcache code to the
PowerDNS-recursor SVN repository but that code was never completed.

It uses UDP to talk to memcached, which should be familiar to programmers working
on recursors.

memcached has supported an UDP protocol for a long time, if you had to build it
right now, you might want to look at Redis it also now has an UDP-protocol.

Because Redis supports persistence and has support for replication.

Jan-Piet_Mens1 · April 19, 2013, 7:59pm

Leen,

you might want to look at Redis it also now has an UDP-protocol.

There was (in 1.3 IIRC) talk of adding UDP to Redis, but the current
protocol specification [1] mentions TCP only, so I think that was
dropped.

-JP

[1] http://redis.io/topics/protocol

Leen_Besselink · April 19, 2013, 8:15pm

Leen,

> you might want to look at Redis it also now has an UDP-protocol.

There was (in 1.3 IIRC) talk of adding UDP to Redis, but the current
protocol specification [1] mentions TCP only, so I think that was
dropped.

-JP

Oops, my mistake, I remembered something going into beta for Redis, but
it was PSYNC. I don't know why I confused the two in my head.

Florian_Lohoff · April 19, 2013, 9:00pm

Hi,

A query is received from a stub resolver for which an answer is not
immediately available from the local cache. The resolver first forwards
this query to a neighbor resolver (hoping for a cache hit) and then
directly after that (or delayed by ~10 ms) begins its own full recursion.

We end up with 2 (or more) resolvers all racing to get to the answer
first. Whichever answer (neighbor or authoritative) is returned to the
original server first is then cached and returned to the stub.

This does mean that neighbor resolvers are potentially both doing the
same recursion at the same time but I'm not too worried about this. It
has the side effect of filling both caches with a valid answer which I
consider a good thing. The primary objective is the fastest possible
responses to the stub resolvers.

I don't see any immediately obvious way to build a configuration that
will do this - have I missed something?

How difficult is it likely to be to build this capability into unbound?

squid used to have something called a sibling - Which would only
answer from cache not fetching it if unavailable. For this
they implemented a udp protocol eliminating the need for tcp http
handshakes.

For DNS like setups using something like multicast to query
siblings would be the optimal solution.

In the end you trade cache lookups which cost CPU against hit ratio.

My guess is that adding memory to your caches is a much easier and
cheaper way to increase hit rate.
If increasing memory does not helpt increase hit rate, asking siblings
wouldnt help aswell. You simple duplicate your content but increase CPU
cycles used.

Flo