DNS-0x20 encoding reduces cache hit count

The author of dnsmasq has introduced DNS-0x20 encoding in the latest
release candidate (dnsmasq-2.91rc4). the latest pi-hole v6 has this
dnsmasq version embedded, with the feature enabled.

I have configured unbound as upstream for dnsmasq (on the same
system). The resulting queries (example for single domain):

Feb 21 06:01:24 unbound[915:3] info: resolving docs.pi-hole.net. AAAA IN
Feb 21 06:01:24 unbound[915:2] info: resolving docs.pi-hole.net. A IN
Feb 21 06:38:09 unbound[915:0] info: resolving dOCs.Pi-hole.NEt. AAAA IN
Feb 21 06:38:09 unbound[915:1] info: resolving docs.PI-HOlE.NET. A IN
Feb 21 06:43:11 unbound[915:0] info: resolving DoCs.pI-hOLE.nET. AAAA IN
Feb 21 06:43:11 unbound[915:2] info: resolving docS.pi-hoLE.NEt. A IN

Unfortunately, the cache (using redis) saves a key for all of the
queries, when the next query (from dnsmasq) uses another variation of
the case variation, there will thus be no cache hit.

You can see some screenshots of the keys here
(https://discourse.pi-hole.net/t/embedded-dnsmasq2-91rc4-causes-upstream-problems-due-to-dns-0x20-encoding/76114/3)

Currently, I have disabled the dnsmasq feature, by adding the setting
“no-0x20-encode”. As a result all dnsmasq queries are lowercase, thus
only 2 redis keys required for caching.

I believe only the DNS resolver, sending queries to the outside world
(recursive unbound in my environment) should activate this feature.
This implies a feature request to implement DNS-0x20 encoding for
unbound, DNS-0x20 encoding is considered a security feature.

I’ve been told by a developer dnsmasq stores only lowercase entries in
it’s cache, it seems logical to apply the same logic for unbound.

Tx for your time and effort.

I'd say this is a bug to fix. Cache should always be lower case.

Unbound already has 0x20 support. You will need to set `use-caps-for-id: yes` in your Unbound configuration. Here are the 0x20 options from the example conf:

'''
# Use 0x20-encoded random bits in the query to foil spoof attempts.
# This feature is an experimental implementation of draft dns-0x20.
# use-caps-for-id: no

# Domains (and domains in them) without support for dns-0x20 and
# the fallback fails because they keep sending different answers.
# caps-exempt: "licdn.com"
# caps-exempt: "senderbase.org"
'''

Hope that helps,
Otto

I would agree with this statement, and I’ll add some of my own hints into this thread.

I thought initially that this would be an error that was related to redis storage of cache, but I did a cursory look at one of our standalone caches and see that storage does indeed seem to be 0x20-friendly:

[root@res310 tmp]# unbound-control lookup youtube.com
The following name servers are used for lookup of youtube.com.
;rrset 24388 4 0 8 3
yoUtube.COM.	24388	IN	NS	ns3.google.com.
yoUtube.COM.	24388	IN	NS	ns2.google.com.
yoUtube.COM.	24388	IN	NS	ns1.google.com.
yoUtube.COM.	24388	IN	NS	ns4.google.com.
[snip]

I have not yet looked to see if different 0x20 results (different capitalization patterns) causes new entries to be stored in memory cache to duplicate Peter’s test - I don’t quite have time for that right now. However, a quick sift through a cache dump (“unbound-control dump_cache > unbound-cache.txt”) shows only one instance that matches another host (www.youtube.com - with “-i” meaning “any capitalization”) but the results are a bit unexpected, as it seems to me that storing as all lower case would be the default behavior:

[root@res310 tmp]# grep -i www.youtube.com dns-cache.txt |grep -v msg
WWw.yOuTUBE.cOm.	292	IN	CNAME	youtube-ui.l.google.com.
WWw.yOuTUBE.cOm. IN CNAME 0
[root@res310 tmp]#

Running v1.21.1 for samples above.

I have not yet looked to see if new insertions are happening on new 0x20 variants, but based on Peter’s comments and evidence above it looks like that may be happening. We see many queries from downstream unbound systems with 0x20 turned on, and if pihole is going to start doing that as a default that seems like it would cause quite a bit of un-necessary cache churn if it is indeed causing new entries to be saved on each 0x20 variation. For our installations, this probably isn’t catastrophic since we have dnsdist in front of unbound and I suspect the dnsdist packet cache (which ignores 0x20 uniqueness when answering an existing in-memory item) minimizes the leakage of 0x20 issues through to the back-end recursive resolvers. However, if we have N front-end dnsdist instances pointing at a single unbound instance (I’m simplifying this for clarity) with clients sending 0x20 unique requests, then it would be the case that up to N requests with different 0x20 values would cause lookup events even within the cache timeout period in unbound. That would be sub-optimal.

JT

As far as I can tell from using dig against a quiescent Unbound
instance and watching the msg.cache.count and rrset.cache.count metrics,
Unbound's RRset and message caches are case insensitive but appear to
be case retentive.

In any case, Unbound's hash functions internally coerce the name key
to lowercase before hashing, rather than requiring domain names to be
lowercased before calling the hash function:

https://github.com/NLnetLabs/unbound/blob/1894c0a1505c6791d6c9f6e77b7ff47cfc1f1545/util/data/dname.c#L300

https://github.com/NLnetLabs/unbound/blob/1894c0a1505c6791d6c9f6e77b7ff47cfc1f1545/util/data/dname.c#L336

So it would be physically impossible for Unbound's in-memory caches to
store multiple cache entries whose keys differ only by case, because
they would hash to the same value.

I have only skimmed the pi-hole discussion thread but I noticed "TTL:
does not expire" in the redis screenshots. I would guess the pi-hole
users are getting confused by the redis cache entries simply not
expiring, as documented in the unbound.conf manpage:

https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html#cache-db-module-options

  Note: Unbound never removes data stored in the Redis server, even
  if some data have expired in terms of DNS TTL or the Redis server
  has cached too much data; if necessary the Redis server must be
  configured to limit the cache size, preferably with some kind of
  least-recently-used eviction policy.

  Additionally, the redis-expire-records: option can be used in order to
  set the relative DNS TTL of the message as timeout to the Redis records;
  keep in mind that some additional memory is used per key and that
  the expire information is stored as absolute Unix timestamps in Redis
  (computer time must be stable).

So if a "docs.pi-hole.net" entry of whatever caps perturbation ever
expires from Unbound's in-memory caches (which of course would happen
frequently due to DNS TTL-based expiration), or Unbound is restarted,
additional cache entries for the same name but likely of different caps
perturbations would be sent to the redis cache when Unbound resolves the
name again, without overwriting the previous entries in the redis cache.
(Redis cache keys are case-sensitive/case-retentive.) And then when
you go and snoop in the redis cache, you'd find all the different caps
perturbations of the same name that had ever been stored in the redis
cache.

Probably it makes sense for Unbound's cachedb module to coerce the cache
key to lowercase when communicating with the redis server, though.

Robert's analysis is spot on.

There is a bug with the cachedb code that caches queries based on the capitalization of the running query. The result of that is that cachedb entries are only useful if the backend is asked with the correct capitalization.

As Robert mentioned, this does not affect Unbound and its internal cache, rather the records in Redis and their inability to prove useful to other (or restarted) Unbound instances.

There is a fix for that in
https://github.com/NLnetLabs/unbound/commit/c5c54862617c5ea2389736156fb84ad7efb73df8

Best regards,
-- Yorgos