I am using unbound on 2 different Servers (also populated bia DHCP as 2 different Name Servers) and would like to make sure that if one Server already answered a query and cached it, the other does not Need to do the same query to the Internet again. I have not found any direct solution but there seems to be a way of exporting/importing Cache via „unbound-control“. Question is, if there is a standard way of doing this or any suggestions About the „best“ solution. Maybe somebody already has something like this working?
this question has come up every year or so. one thing to know is that if this
is a good idea, then it would be a good multi-vendor idea, not just for
unbound, though unbound has a track record of doing things first that turn out
to be good ideas and end up standardized in DNS itself in some form.
some open questions that relate to discard policy:
if you had hundreds of cache misses per second which ones would you share with
your peer recursive nameservers? (maybe only share it after its first reuse? i
think the opendns anycast network uses a DHT for this, to inform peers of
availability of data, so it can be fetched from a peer if it's needed.)
if your peer is sharing hundreds of cache misses per second with you, would
you ever discard something from your own cache to make room for something from
theirs? (generally this isn't the right thing, so you'd give your cache two
LRU quotas, one for your own cache misses, one for those shared to you.)
when running at quota, and needing to discard something because a peer just
told you some new thing and you don't have room for N+1, would you choose
least recently learned (LRL) rather than least recently used (LRU) because
when things are used they've move from your peer-cache to your own-cache?
other open questions:
when using ECS, how do you know which cache additions to share, if your peer
or your peer's stubs don't have the same topology as you/yours do?
would you rate limit the feed to a peer so as not to flood their capacity?
this is a fascinating topic, as i hope you'll agree.
I am not that Deep into DNS logics so most likely not a very good communication Partner when the Topic becomes that complex I am using Unbound for my home Network only, there I think theoretical numbers like „hundreds cache misses per second“ are not that realistic. But I totally agree that making such a feature generic, this is something that Needs to be taken care of.
Maybe a solution can be to integrate a Sub layer inbetween the local Cache and external resolvers, a shared Cache. This shared Cache is updated by all Peers when a query gets resolved and every peer can ask the shared Cache for entries when local Cache does not deliver any results. Shared Cache instances are then automatically synchronized.
Obviously this Topic is not an easy one and it seems that there is Nothing in place I can reuse.
Thank you Paul for your answer. Paul is correct that it is very dependent on your cache replacement algorithm and how to inform other resolvers that answers are already in cache.
To answer your question, Talkabout, Unbound has a module for a shared cache with a Redis backend. It works as a secondary cache, 1) first local cache lookup, 2) shared cache lookup, 3) resolve/iterate. For configuration and use, see the unbound.conf(5) manpages, section "Cache DB Module Options". (You may have to compile Unbound yourself with the --with-libhiredis option.)
Your suggestion to export/import the cache with unbound-control can be used for running Unbound clusters and you want to start a new Unbound instance with a hot cache.
You can do something a bit like your "sublayer" with dnsdist. You can
configure it with multiple back-end servers using a whashed or chashed
selection policy so that you aren't asking both back-ends to resolve all
questions.
The caveat is that "infrastructure" records (i.e. the delegation chain: NS
records, glue records, DS and DNSKEY records) have a big effect on DNS
performance and (unlike Unbound's shared cache) dnsdist won't help the
back-ends to avoid duplicating work resolving infrastructure records.
This discussion reminds me of Geoff Huston's investigation of zombie DNS
queries a few years ago http://www.potaroo.net/ispcol/2016-03/zombies.html
which made me wonder if some resolvers were using ill-advised hacks to
pre-heat cache B by sniffing the network for cache A's query traffic,
which is a great way to get a Sorcerer's Apprentice effect!
I have set up Unbound with redis Cache now and will check how well this works. I have one Question left: documentation states that unbound does NOT invalidate keys in the redis Cache even if they expire. Question from my side is why is unbound not simply using the „EXPIRE“ function of redis to set the TTL to the same time that unbound receives from an authority dns Server? That way no other maintenance Needs to be done. If there still is a valid reason (which I am sure there is ), what is the recommended way to cleanup redis?
If you want two truly independent recursive servers to share cache results, then you should configure one as "primary" and the other as "secondary". The primary is configured normally to recurse. The secondary is configured to forward to the primary (clause forward-zone:), but not exclusively (option forward-first: yes). Neither has a dependency on a central database, so no single point failure. Queries to the secondary will need to bounce through the primary. On a whole network cold boot, the cache fill will demonstrate some delay. Once rolling for an hour, no user will notice the difference. If the primary fails, the secondary can just switch over to recursion itself (forward-first: falls through).
If you wanted to experiment with a more corporate like configuration, then you would set those two Unbound instances in a firewall-DMZ. These two only listen to specific Unbound instances from within the firewall intranet (clause server: option access-control:). The intranet Unbound instances or dnsmasq if small office suites may only forward to the DMZ. You prevent all clients from being able to use DNS (port 53/853) externally. Their DHCP configuration only offers "nearest" two or three intranet DNS servers, never DMZ, and never external access.
Anyway just a quick scratch of multi server configuration which is by no means complete.
- Eric
The reason is that you could serve expired records from that cache (if
you configure unbound to do so) so they shouldn't expire after the TTL.
As for the recommended way to cleanup redis (from the man page):
"
It should be noted that Unbound never removes data stored in the Redis
server, even if some data have expired in terms of DNS TTL or the Redis
server has cached too much data; if necessary the Redis server must be
configured to limit the cache size, preferably with some kind of
least-recently-used eviction policy.
"
I would recommend going through the cachedb section in the unbound.conf
man page as it also documents the behavior and some caveats such as the
"synchronous communication" between unbound and redis.
Maybe it’s stupid but it still not completely clear for me. As Unbound knows when a particular entry Needs to be invalidated (based on the configuration it received upon load) Setting the TTL via EXPIRE would also work for the case you mentioned (serving outdated entries based on Unbound configuration). Maybe I am missing something?
I have now created the following Setup:
Server 1:
Unbound (connected to KeyDB as backend)
KeyDB (Redis drop-in replacement with active replication, Bound to Server 2)
Server 2:
Unbound (connected to KeyDB as backend)
KeyDB (Redis drop-in replacement with active replication, Bound to Server 1)
That way every entry added by one of the Servers is automatically available also for the other one (active replication of KeyDB) => shared Cache Entries are evicted after 4 hours of idle time. Will Keep it that way for now and if it works well the next days this will become my productive setup.
Maybe it’s stupid but it still not completely clear for me. As Unbound
knows when a particular entry Needs to be invalidated (based on the
configuration it received upon load) Setting the TTL via EXPIRE would
also work for the case you mentioned (serving outdated entries based on
Unbound configuration). Maybe I am missing something?
You are right that this could work but it may also be the case that you
want to turn on (or reconfigure) serve-expired on the fly through
unbound-control.
In that case you would like to have the expired records still lying around.
would it be an Option to add a configuration setting to the backend configuration saying whether TTL should be applied or not? It will simplify the Cache handling.
any Chance that the EXPIRE logic finds ist way to the unbound Code? Currently I have an lru eveition in place, but this is not an optimal solution in my opinion.