I’m running unbound as my recursive resolver and encountered various "dnsmasq: nameserver 127.0.0.1 refused to do a recursive query” error messages. After some debugging with the help of an LLM it seems that the RA flag is missing when receiving synthesized NODATA or NXDOMAIN responses from the NSEC negative cache. Now I’m not sure if this is a bug and an issue should be opened or if this intended behavior. Do you need further info to make an assessment? If so, what kind of input would help?
Do you have a concrete case we can look at?
Testcases in Unbound do return the RA flag when 'aggressive-nsec: yes' is used (by default).
Maybe you are using RPZ data and you have set
'rpz-signal-nxdomain-ra: yes' [1] ?
Btw yhis option was explicitly requested to play nice with dnsmasq IIRC.
actually I observe the error mainly with various .cdn.cloudflare.net-domains. It seems that the behavior is specific to the cachedb module and only occurs when the cached answer’s TTL has expired while the NSEC records remain cached. The synthesized NODATA/NXDOMAIN response is issued directly from the cachedb module, without the iterator-module and no RA flag is added to the response.
Manually I have trouble reproducing the issue with dig, but it is observed continuously with my dnsmasq instance.
Unbound version: 1.24.2
My module-config: validator cachedb iterator
In the logs below, you can observe two distinct unbound instances (127.0.0.1 and 192.168.1.160). One instance was patched (127.0.0.1) and the dnsmasq warning no longer pops up:
A note on the logs I shared earlier: apologies for the confusion — the host at 127.0.0.1 had already been patched on the afternoon of Feb 15, and I had to trim the logs due to message size constraints. After my patch was applied, the warnings stopped.
I've now recompiled Unbound with your patch and so far everything looks good. Keep in mind though, that I haven't been able to trigger the issue manually in a controlled test — the behavior seems to depend on specific cache timing conditions that are hard to reproduce on demand. That said, based on what I'm seeing in my network: the patched instance has been clean, while the unpatched one has already produced another warning.