We’ve been using one of those ads blocklists that is basically a long text file of local-data statements sending everything to 127.0.0.1. This kind of thing:
It’s worked so well for us that we’d like to grow that list… by a lot (~5M entries). I’m wondering if and to what extent this is going to have a negative impact on performances or if “simply” the process will grow in memory usage. Also how efficient are those lookups going to be? I’m not sure this is even a common use case that unbound has been optimized for.
The local-zones are stored in a red-black tree, so searching will be
O(log n).
Your memory usage will increase *a lot*. Unbound will create an 8k
memory region for every local-zone containing local-data, so you will
need ~40GB of memory. Do you really need the local data for each zone?
Maybe using a local-zone type that prevents the resolution also works
for you. New versions of unbound (available from our repository, not
released yet) will not allocate the 8K for local-zones without
local-data (see https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=839). In that case
your local-zone tree will need ~2GB of memory.
I have a suggestion not to redirect to 127.0.0.1 and refuse instead; what your customer is running a webserver on localhost or any other app that runs something on port 80 (like TeamViewer at some companies). That’s needless connections and processing.
I suggest sending NXDOMAIN instead of IN A, like so:
local-zone: “cdn.sh-jxzx.com” refuse
That’s beneficial also because the application requesting will not even try to issue a TCP connection to 127.0.0.1; gethostbyaddr() and friends will return error.
Ralph, in say, 1.4.22, is refuse less, more or the same memory-consuming as adding aditional local-data for IN A?
thanks to all of you for the detailed answers and examples. I agree sending a refuse is a better option and as a matter of fact it may have also yielded the answer to something else entirely that I was using a python script for (I mentioned this in another thread on caching). From the docs it says
refuse
Send an error message reply, with rcode REFUSED. If there is
a match from local data, the query is answered.
and all queries to all subdomains of example.com will be rejected except for ok.example.com. Is that correct?
In which case, I have a diff question: is it possible to set it up so that instead of setting an ip for the A the query is passed to the iterator? in other words create a sort of whitelist, deny everything except these subdomains?