RA flag missing on NSEC negative cache responses

Hello together,

I’m running unbound as my recursive resolver and encountered various "dnsmasq: nameserver 127.0.0.1 refused to do a recursive query” error messages. After some debugging with the help of an LLM it seems that the RA flag is missing when receiving synthesized NODATA or NXDOMAIN responses from the NSEC negative cache. Now I’m not sure if this is a bug and an issue should be opened or if this intended behavior. Do you need further info to make an assessment? If so, what kind of input would help?

Thanks and kind regards
Jürgen

Hi Jürgen,

Do you have a concrete case we can look at?
Testcases in Unbound do return the RA flag when 'aggressive-nsec: yes' is used (by default).

Maybe you are using RPZ data and you have set
'rpz-signal-nxdomain-ra: yes' [1] ?
Btw yhis option was explicitly requested to play nice with dnsmasq IIRC.

Best regards,
-- Yorgos

[1] https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html#unbound-conf-rpz-rpz-signal-nxdomain-ra

Hi Yorgos,

actually I observe the error mainly with various .cdn.cloudflare.net-domains. It seems that the behavior is specific to the cachedb module and only occurs when the cached answer’s TTL has expired while the NSEC records remain cached. The synthesized NODATA/NXDOMAIN response is issued directly from the cachedb module, without the iterator-module and no RA flag is added to the response.
Manually I have trouble reproducing the issue with dig, but it is observed continuously with my dnsmasq instance.

Unbound version: 1.24.2
My module-config: validator cachedb iterator

In the logs below, you can observe two distinct unbound instances (127.0.0.1 and 192.168.1.160). One instance was patched (127.0.0.1) and the dnsmasq warning no longer pops up:

— a/cachedb/cachedb.c
+++ b/cachedb/cachedb.c
@@ -724,6 +724,13 @@
}
if(!msg)
return 0;

  • /* fixup flags to be sensible for a reply based on the cache.
    • This module means that RA is available. It is an answer QR.
    • Not AA from cache. Not CD in cache (depends on client bit).
    • This is needed because val_neg_getmsg() synthesizes messages
    • with dns_msg_create() which only sets BIT_QR, missing BIT_RA. */
  • msg->rep->flags |= (BIT_RA | BIT_QR);
  • msg->rep->flags &= ~(BIT_AA | BIT_CD);
    /* this is the returned msg */
    qstate->return_rcode = LDNS_RCODE_NOERROR;
    qstate->return_msg = msg;

'rpz-signal-nxdomain-ra: yes’ is not set on my end - afaik it defaults to no.

Here are some dnsmasq logs that show the “issue” from dnsmasq’s perspective:

Feb 15 00:29:21 dnsmasq[1084]: query[A] connect.garmin.com from 192.168.1.188
Feb 15 00:29:21 dnsmasq[1084]: forwarded connect.garmin.com to 127.0.0.1#2053
Feb 15 00:29:21 dnsmasq[1084]: reply connect.garmin.com is
Feb 15 00:29:21 dnsmasq[1084]: reply connect.garmin.com.cdn.cloudflare.net is NODATA
Feb 15 00:29:21 dnsmasq[1084]: reply connect.garmin.com is
Feb 15 00:29:21 dnsmasq[1084]: reply connect.garmin.com.cdn.cloudflare.net is 104.17.167.14
Feb 15 00:29:21 dnsmasq[1084]: reply connect.garmin.com.cdn.cloudflare.net is 104.17.168.14
Feb 15 00:29:22 dnsmasq[1084]: query[HTTPS] connect.garmin.com.cdn.cloudflare.net from 192.168.1.188
Feb 15 00:29:22 dnsmasq[1084]: forwarded connect.garmin.com.cdn.cloudflare.net to 127.0.0.1#2053
Feb 15 00:29:22 dnsmasq[1084]: nameserver 127.0.0.1 refused to do a recursive query

Hi Jürgen,

Not sure if the logs are correct since I see both IPs respond the same with the "refused to do a recursive query" message.

Other than that, your analysis and pinpointing the issue seems correct.
We have chosen a different approach when to set the RA flag though: https://github.com/NLnetLabs/unbound/commit/014ed9c5ff393d9d10a92e85e7cac080253b968b.

If you could test and verify that would be great.

Best regards,
-- Yorgos

Hi Yorgos,

Thanks for the quick fix.

A note on the logs I shared earlier: apologies for the confusion — the host at 127.0.0.1 had already been patched on the afternoon of Feb 15, and I had to trim the logs due to message size constraints. After my patch was applied, the warnings stopped.

I've now recompiled Unbound with your patch and so far everything looks good. Keep in mind though, that I haven't been able to trigger the issue manually in a controlled test — the behavior seems to depend on specific cache timing conditions that are hard to reproduce on demand. That said, based on what I'm seeing in my network: the patched instance has been clean, while the unpatched one has already produced another warning.

Thanks again,
Jürgen

Hi Jürgen,

Sounds good, thanks for letting us know!

Best regards,
-- Yorgos