is a very large local-data list a problem?

Spike · November 27, 2016, 6:08pm

Dear all,

We’ve been using one of those ads blocklists that is basically a long text file of local-data statements sending everything to 127.0.0.1. This kind of thing:

https://deadc0de.re/articles/unbound-blocking-ads.html

It’s worked so well for us that we’d like to grow that list… by a lot (~5M entries). I’m wondering if and to what extent this is going to have a negative impact on performances or if “simply” the process will grow in memory usage. Also how efficient are those lookups going to be? I’m not sure this is even a common use case that unbound has been optimized for.

thanks,

Spike

Ralph_Dolmans · November 28, 2016, 10:23am

Hi Spike,

The local-zones are stored in a red-black tree, so searching will be
O(log n).

Your memory usage will increase *a lot*. Unbound will create an 8k
memory region for every local-zone containing local-data, so you will
need ~40GB of memory. Do you really need the local data for each zone?
Maybe using a local-zone type that prevents the resolution also works
for you. New versions of unbound (available from our repository, not
released yet) will not allocate the 8K for local-zones without
local-data (see
https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=839). In that case
your local-zone tree will need ~2GB of memory.

Regards,
-- Ralph

Maciej_Soltysiak · November 28, 2016, 10:49am

I have a suggestion not to redirect to 127.0.0.1 and refuse instead; what your customer is running a webserver on localhost or any other app that runs something on port 80 (like TeamViewer at some companies). That’s needless connections and processing.

I suggest sending NXDOMAIN instead of IN A, like so:
local-zone: “cdn.sh-jxzx.com” refuse

That’s beneficial also because the application requesting will not even try to issue a TCP connection to 127.0.0.1; gethostbyaddr() and friends will return error.

Ralph, in say, 1.4.22, is refuse less, more or the same memory-consuming as adding aditional local-data for IN A?

Best regards,
Maciej

Simon_Deziel · November 28, 2016, 4:37pm

Memory-wise, I found that just using local-data with the implied
transparent local-zone was best. With a ~12k hosts list:

# local-data: "ads.com A 127.0.0.1"
$ ps aux| grep unbound
unbound 32557 1.5 0.2 58316 15964 ? Ss 11:27 0:00 /usr/sbin/unbound -d

# local-zone: "ads.com" static
$ ps aux| grep unbound
unbound 32139 0.5 0.7 152840 63352 ? Ss 11:21 0:00 /usr/sbin/unbound -d

# local-zone: "ads.com" refuse
$ ps aux| grep unbound
unbound 32247 2.3 0.7 152840 63432 ? Ss 11:22 0:00 /usr/sbin/unbound -d

Setting a local-data with only the A record will return an empty AAAA.

HTH,
Simon

Spike · December 15, 2016, 4:22am

thanks to all of you for the detailed answers and examples. I agree sending a refuse is a better option and as a matter of fact it may have also yielded the answer to something else entirely that I was using a python script for (I mentioned this in another thread on caching). From the docs it says

refuse
Send an error message reply, with rcode REFUSED. If there is
a match from local data, the query is answered.

So if I get that correctly I could have

local-zone: example.com refuse
local-data: ok.example.com A x.x.x.x

and all queries to all subdomains of example.com will be rejected except for ok.example.com. Is that correct?

In which case, I have a diff question: is it possible to set it up so that instead of setting an ip for the A the query is passed to the iterator? in other words create a sort of whitelist, deny everything except these subdomains?

thanks,

Spike