unbound replaces CNAME query with A query?

Christoph · March 26, 2023, 4:29pm

Hi,

we are tracking/debugging [1][2] an issue that results in the failure of
certificate renewal (ACME DNS challenge).

If you ask unbound 1.17.1 the query shown below when it has an empty cache you get an NXDOAMIN reply, if you ask it again you will get the actual expected answer (NOERROR), PowerDNS Recursor does not have that issue.

Investigating the DNS traffic has also shown that
the stub -> unbound CNAME query results in an unbound -> authoritative A qtype query instead of a CNAME query.

Can you reproduce this issue and confirm this is unexpected?

thanks!
Christoph

dig _acme-challenge.bender-doh.applied-privacy.net CNAME

; <<>> DiG 9.18.13 <<>> _acme-challenge.bender-doh.applied-privacy.net CNAME
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 20502
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;_acme-challenge.bender-doh.applied-privacy.net. IN CNAME

;; ANSWER SECTION:
_acme-challenge.bender-doh.applied-privacy.net. 86400 IN CNAME bender-doh.acme-dns-challenge.applied-privacy.net.

;; AUTHORITY SECTION:
acme-dns-challenge.applied-privacy.net. 300 IN SOA get.desec.io. get.desec.io. 2023035286 86400 3600 2419200 3600

;; Query time: 114 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; MSG SIZE rcvd: 167

pemensik · March 30, 2023, 6:52pm

Correct me if I understand it not correctly. whether you query CNAME or A record should not make a difference in NXDOMAIN status. But in any case the answer is not there. How does it change ACME process when there is NXDOMAIN and not just no-answer NOERROR response?

_acme-challenge.bender-doh.applied-privacy.net exists with cname. Its cname target returns NXDOMAIN. So yes, it is a bit confusing what is the final result. What exactly is the stub in this case? libresolv library? getaddrinfo() cannot query cname itself, it can do that via A query however.

What is the point of querying just CNAME? Does it have a specific reason?

Unbound seems proactive to fetch actually useful record instead of just intermediate CNAME. I am not sure that has to be strictly wrong. The result it delivers is similar. It tells there is CNAME and its target does not exist. It just seem the stub does not check actual contents of message except rcode. Can stub resolver do anything useful with information that there is CNAME not leading to final destination?

Note: it would be much easier if you could share just pcap containing the problem instead of only text description.

Christoph · March 30, 2023, 9:28pm

Hi Petr,

thanks for your reply and your questions.

Petr Menšík via Unbound-users:

Correct me if I understand it not correctly. whether you query CNAME
or A record should not make a difference in NXDOMAIN status. But in
any case the answer is not there. How does it change ACME process
when there is NXDOMAIN and not just no-answer NOERROR response?

That CNAME DNS query is used by lego - an ACME client - to find
the DNS record it has to update (the ACME DNS TXT challenge).
Lego's CNAME support used to be experimental and is now enabled by default.

The NXDOMAIN answer results in lego concluding "there is no CNAME".
The impact of that unexpected NXDOMAIN answer is that lego will attempt
to use the provided DNS API key to create a TXT record it has no
permissions for. It only has permissions for the target of the existing
CNAME.
For this reason the NOERROR and its answer is important, even if the
final record in that CNAME chain does not exist. It is lego's job to
create it.

_acme-challenge.bender-doh.applied-privacy.net exists with cname. Its
cname target returns NXDOMAIN. So yes, it is a bit confusing what is
the final result. What exactly is the stub in this case? libresolv
library?

It is running lego on a FreeBSD server.

I hope the text also helps with answering your other questions below, if
it is not clear please let me know and I will try to rephrase.

What is the point of querying just CNAME? Does it have a specific
reason?

Unbound seems proactive to fetch actually useful record instead of
just intermediate CNAME I am not sure that has to be strictly wrong.
The result it delivers is similar. It tells there is CNAME and its
target does not exist.

If unbound is just trying to be useful then it should still be consistent and provide the same answer if you ask it twice - which is not the case currently.

It just seem the stub does not check actual
contents of message except rcode. Can stub resolver do anything
useful with information that there is CNAME not leading to final
destination?

Note: it would be much easier if you could share just pcap containing
the problem instead of only text description.

I actually was hoping to achieve the opposite, because looking at the
text does not require people to have a pcap parser and open a file from a mailing list but you got the gist of it anyway.

thanks,
Christoph

Tuomo_Soini · March 31, 2023, 8:17am

There really seem to be issue in unbound when querying cname.

I created test record, pointing at another domain, non-exiting name.

kdig cnametest.bleve.fi. CNAME

;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 46683
;; Flags: qr rd ra ad; QUERY: 1; ANSWER: 0; AUTHORITY: 1; ADDITIONAL: 0

;; QUESTION SECTION:
;; cnametest.bleve.fi. IN CNAME

;; AUTHORITY SECTION:
bleve.fi. 3462 IN SOA
foo-ns.foobar.fi. hostmaster.foobar.fi. 1679142493 28800 7200 864000
28800

;; Received 97 B
;; Time 2023-03-31 11:13:51 EEST
;; From 2001:998:2e::1@53(UDP) in 0.8 ms

If I query from authoritative server directly, I get correct answer.

It looks like unbound errorously try to follow cname to non-existing
record even when cname itself is queried. CNAME should only be followed
if something != cname is queried.

pemensik · March 31, 2023, 11:01am

I am using dnssec-trigger-0.17-7.fc36.x86_64 and unbound-1.17.1-1.fc36.x86_64 on Fedora 36. But I cannot reproduce the behaviour, even if I flush cache by "unbound-control flush_zone ." It is returning consistently CNAME with NOERROR. Does it happen only when the unbound does not have forwarders and is iterating itself? I keep getting CNAME with NOERROR.

$ kdig cnametest.bleve.fi. CNAME
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 33690
;; Flags: qr rd ra ad; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 0

;; QUESTION SECTION:
;; cnametest.bleve.fi. IN CNAME

;; ANSWER SECTION:
cnametest.bleve.fi. 7200 IN CNAME nxdomain.foobar.fi.

;; Received 66 B
;; Time 2023-03-31 12:58:20 CEST
;; From 127.0.0.1@53(UDP) in 0.5 ms

Does it happen only after unbound is fresh started? Are there steps to reproduce on the running instance?

pemensik · March 31, 2023, 11:35am

I have tried to reproduce it on my own unbound-1.17.1-1.fc36.x86_64, but it does not behave like you have described after flushing the cache. Not to me. I just guess there might be something else required, but not sure what. Is there something in unbound logs, which would make hint why it forwarded A query instead? Can you try increasing verbosity by unbound-control verbosity <newvalue> and query the name afterwards?

$ unbound-control flush_zone . && dig _acme-challenge.bender-doh.applied-privacy.net CNAME
ok removed 310 rrsets, 218 messages and 16 key entries

; <<>> DiG 9.18.13 <<>> _acme-challenge.bender-doh.applied-privacy.net CNAME
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23092
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;_acme-challenge.bender-doh.applied-privacy.net. IN CNAME

;; ANSWER SECTION:
_acme-challenge.bender-doh.applied-privacy.net. 86400 IN CNAME bender-doh.acme-dns-challenge.applied-privacy.net.

;; Query time: 177 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Mar 31 13:06:33 CEST 2023
;; MSG SIZE rcvd: 119

Hi Petr,

thanks for your reply and your questions.

Petr Menšík via Unbound-users:

Correct me if I understand it not correctly. whether you query CNAME
or A record should not make a difference in NXDOMAIN status. But in
any case the answer is not there. How does it change ACME process
when there is NXDOMAIN and not just no-answer NOERROR response?

That CNAME DNS query is used by lego - an ACME client - to find
the DNS record it has to update (the ACME DNS TXT challenge).
Lego's CNAME support used to be experimental and is now enabled by default.

The NXDOMAIN answer results in lego concluding "there is no CNAME".
The impact of that unexpected NXDOMAIN answer is that lego will attempt
to use the provided DNS API key to create a TXT record it has no
permissions for. It only has permissions for the target of the existing
CNAME.
For this reason the NOERROR and its answer is important, even if the
final record in that CNAME chain does not exist. It is lego's job to
create it.

Okay, I would have expected TXT query before the update, but okay. The problem is I see different behaviour on the same version as you have. So the primary reason you are quering this name is to prepare UPDATE query. But you want to update only CNAME target if there is any, not the original name itself.

Anyway, the answer contains CNAME in ANSWER section, even if the status is NXDOMAIN. So even with this unusual reply the software should be able to decipher where the queried name leads to. The only part wrong in the answer is NXDOMAIN status. I admit it usually arrives with empty answer. Not sure the answer with CNAME present is against RFCs.

_acme-challenge.bender-doh.applied-privacy.net exists with cname. Its
cname target returns NXDOMAIN. So yes, it is a bit confusing what is
the final result. What exactly is the stub in this case? libresolv
library?

It is running lego on a FreeBSD server.

I hope the text also helps with answering your other questions below, if
it is not clear please let me know and I will try to rephrase.

If unbound is just trying to be useful then it should still be consistent and provide the same answer if you ask it twice - which is not the case currently.

I am running it on Fedora 36. I doubt it should have different results.

Yes, I agree it should return the same answer. It could differ only in minor differences like used opt parameters. But not different status. If you ask the first or the third time, result should differ only in TTL or something insignificant.

Note: it would be much easier if you could share just pcap containing
the problem instead of only text description.

I actually was hoping to achieve the opposite, because looking at the
text does not require people to have a pcap parser and open a file from a mailing list but you got the gist of it anyway.

thanks,
Christoph

Okay, it might be just my preference. The thing is those packet descriptions are not compact, it makes the message quite long. dig-like output would be better, but that is more difficult to get from pcap file.

Tuomo_Soini · March 31, 2023, 12:54pm

Try the query I just listed, should work with bind dig too.
If you query bleve.fi authoritative dns servers to get correct answer.

cname query only fails if cname target gives NXDOMAIN.

For example following query works correctly because destination of the
cname exists.

kdig _443._tcp.bleve.fi. cname

This is obviously a bug, very special case which resolver need to
handle different way than normal cname resolution. Also cloudflare,
quad9, and google resolvers seem to have same problem. Seem to be
special case not handled by most dns resolver.

dnsmasq and bind seem to be able to handle that query correctly.

pemensik · March 31, 2023, 1:57pm

I am using dnssec-trigger-0.17-7.fc36.x86_64 and
unbound-1.17.1-1.fc36.x86_64 on Fedora 36. But I cannot reproduce the
behaviour, even if I flush cache by "unbound-control flush_zone ." It
is returning consistently CNAME with NOERROR. Does it happen only
when the unbound does not have forwarders and is iterating itself? I
keep getting CNAME with NOERROR.

> $ kdig cnametest.bleve.fi. CNAME

Try the query I just listed, should work with bind dig too.
If you query bleve.fi authoritative dns servers to get correct answer.

cname query only fails if cname target gives NXDOMAIN.

I have tried on my unbound and it never returns NXDOMAIN to me. The result is the same with kdig or dig, that makes no difference. I get NOERROR, not NXDOMAIN.

$ kdig cnametest.bleve.fi. CNAME | head -2
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 35718
;; Flags: qr rd ra ad; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 0

For example following query works correctly because destination of the
cname exists.

kdig _443._tcp.bleve.fi. cname

This is obviously a bug, very special case which resolver need to
handle different way than normal cname resolution. Also cloudflare,
quad9, and google resolvers seem to have same problem. Seem to be
special case not handled by most dns resolver.

dnsmasq and bind seem to be able to handle that query correctly.

dnsmasq does not handle CNAMEs at all. It requires upstream recursive server to do the job and just passes the result to a client. bind can to proper iteration job from root hints however.

If it is a bug, I would suggest creating issue at https://github.com/NLnetLabs/unbound/

But maybe more precise steps should be described when it returns NXDOMAIN. Just flushing the cache and doing your query does not seem to be enough for me.

Tuomo_Soini · March 31, 2023, 2:09pm

> cname query only fails if cname target gives NXDOMAIN.

I have tried on my unbound and it never returns NXDOMAIN to me. The
result is the same with kdig or dig, that makes no difference. I get
NOERROR, not NXDOMAIN.

All unbounds here without forwarders set up, is that the difference?

pemensik · March 31, 2023, 2:45pm

I have tried on my unbound and it never returns NXDOMAIN to me. The
result is the same with kdig or dig, that makes no difference. I get
NOERROR, not NXDOMAIN.
All unbounds here without forwarders set up, is that the difference?

I have tried it inside a Rawhide container.

# unbound-control forward
off (using root hints)

# dig @localhost cnametest.bleve.fi. CNAME

; <<>> DiG 9.18.13 <<>> @localhost cnametest.bleve.fi. CNAME
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55072
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;cnametest.bleve.fi. IN CNAME

;; ANSWER SECTION:
cnametest.bleve.fi. 7118 IN CNAME nxdomain.foobar.fi.

;; Query time: 0 msec
;; SERVER: ::1#53(localhost) (UDP)
;; WHEN: Fri Mar 31 16:20:26 CEST 2023
;; MSG SIZE rcvd: 77

Just after fresh restart, it is NOERROR. As it is later. Indeed, the query unbound sends to cnametest.bleve.fi is A? query. But the response delivered to dig is a correct one. Tested with unbound-1.17.1-2.fc38.x86_64.

Frame 641: 89 bytes on wire (712 bits), 89 bytes captured (712 bits) on interface virbr0, id 0
Ethernet II, Src: 7e:85:92:43:88:71 (7e:85:92:43:88:71), Dst: RealtekU_02:bd:85 (52:54:00:02:bd:85)
Internet Protocol Version 4, Src: 192.168.122.184, Dst: 87.239.120.11
User Datagram Protocol, Src Port: 46986, Dst Port: 53
Domain Name System (query)
Transaction ID: 0x4302
Flags: 0x0010 Standard query
Questions: 1
Answer RRs: 0
Authority RRs: 0
Additional RRs: 1
Queries
cnametest.bleve.fi: type A, class IN
Additional records
[Response In: 719]

It responds to it with nameservers of bleve.fi. But to those servers it already sends CNAME query, not A? Attaching my pcap.

When I did dig @localhost ns bleve.fi. before cnametest, it returned SERVFAIL the first time. Only then it responded with NOERROR. So no, I do not know how to get NXDOMAIN response from unbound. I get similar results for the original query.

(attachments)

cnametest-bleve.fi-filtered.pcapng (6.61 KB)

bmcdonaldjr · March 31, 2023, 4:22pm

My understanding is this:

If a dig command is directed to a resolver with type=CNAME specified and the resolver responds with anything other than the asked for CNAME information, this may indeed be a bug. I’m not sure of the results if the CNAME target exists in cache. Another way to see similar results would be to submit a type=ANY (default) with the +norecurse switch.

I’d be interested to see the results with the +norecurse switch on.

Bob

Tuomo_Soini · March 31, 2023, 6:06pm

Hmh. Now I have more info. This is some kind of issue with unbound
cache. if I run unbound-control reload, that causes unbound time to
time fail query. And even when failed, next query will succeed.

Christoph · April 1, 2023, 7:51am

Hmh. Now I have more info. This is some kind of issue with unbound
cache. if I run unbound-control reload, that causes unbound time to
time fail query. And even when failed, next query will succeed.

Thanks for confirming what I'm observing.

I've filed a github issue now:

github.com/NLnetLabs/unbound

NXDOMAIN instead of NOERROR rcode when asked for existing CNAME record

opened 07:48AM - 01 Apr 23 UTC

closed 08:06AM - 04 Apr 23 UTC

appliedprivacy

**Describe the bug** Unbound answers to a CNAME query with NXDOMAIN instead o…f NOERROR but includes the actual existing record as well. Actual expected rcode: NOERROR Also: When asked for a CNAME, unbound asks the authoritative NS for an A record. Actual expected qtype: CNAME **To reproduce** Steps to reproduce the behavior: 1. start unbound so it has an empty cache when the query reaches unbound (config is provided at the end of this bugreport) 2. ask unbound for this existing CNAME DNS record `dig _acme-challenge.bender-doh.applied-privacy.net CNAME` -> NXDOMAIN 3. ask unbound again without flushing the cache first, you will get a NOERROR rcode Others on the mailing list have confirmed seeing the same issue. While looking into the PCAP files from stub -> unbound and unbound -> authoritative, I also noticed that the CNAME query send to unbound results in unbound asking the authoritative for an A record - which does not existing. This mismatch in inbound and outbound qtype might be related to the root cause of the bug. **Expected behavior** unbound should ask the authoritative nameserver for a CNAME record not an A record. unbound should answer with an NOERROR rcode for existing CNAMEs - like other resolvers do (for example PowerDNS Recursor). **System:** - Unbound version: ``` pkg info unbound unbound-1.17.1_2 Name : unbound Version : 1.17.1_2 Installed on : Sat Feb 18 22:20:01 2023 CET Origin : dns/unbound Architecture : FreeBSD:13:amd64 ``` - OS: FreeBSD 13.1 - `unbound -V` output: ``` Version 1.17.1 Configure line: --with-libexpat=/usr/local --with-ssl=/usr --enable-dnscrypt --disable-dnstap --with-libnghttp2 --with-dynlibmodule --enable-ecdsa --disable-event-api --enable-gost --with-libevent --disable-subnet --disable-tfo-client --disable-tfo-server --with-pthreads --prefix=/usr/local --localstatedir=/var --mandir=/usr/local/man --infodir=/usr/local/share/info/ --build=amd64-portbld-freebsd13.1 Linked libs: libevent 2.1.12-stable (it uses kqueue), OpenSSL 1.1.1o-freebsd 3 May 2022 Linked modules: dns64 dynlib respip validator iterator DNSCrypt feature available ``` **Additional information** Mailing list discussions: * unbound-users: https://lists.nlnetlabs.nl/pipermail/unbound-users/2023-March/008049.html * powerdns-users: https://mailman.powerdns.com/pipermail/pdns-users/2023-March/028156.html * lego github issue: https://github.com/go-acme/lego/issues/1739 **unbound.conf** ``` server: verbosity: 0 access-control: 109.70.100.0/24 allow access-control: ::1/128 allow access-control: 127.0.0.1/24 allow edns-tcp-keepalive: yes incoming-num-tcp: 200 # plain UDP interface: 127.0.0.1@53 interface: ::1@53 interface: 109.70.100.133@53 num-threads: 2 msg-cache-size: 100m rrset-cache-size: 200m key-cache-size: 10m neg-cache-size: 10m harden-below-nxdomain: yes minimal-responses: yes prefetch: yes prefetch-key: yes aggressive-nsec: yes use-caps-for-id: yes hide-identity: yes hide-version: yes hide-trustanchor: yes qname-minimisation: yes # The following line will configure unbound to perform cryptographic # DNSSEC validation using the root trust anchor. auto-trust-anchor-file: "/usr/local/etc/unbound/root.key" extended-statistics: yes statistics-cumulative: no statistics-interval: 0 remote-control: control-enable: yes # root on loopback auth-zone: name: "." master: "k.root-servers.net" fallback-enabled: yes for-downstream: no for-upstream: yes zonefile: "root.zone" ```

best regards,
Christoph