Bogus resolution with forwarding and DLV

Hello list,

I'm running Fedora 21 with dnssec-trigger and unbound 1.5.1. The unbound is
configured by the dnssec-trigger to forward all queries to a local-network
validating resolver provided by DHCP.

With this configuration, unbound incorrectly recognizes the fedorapeople.org
domain as bogus. The domain uses DLV, which I guess might cause the problem.

% kdig @::1 jvcelak.fedorapeople.org
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 54325
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 0

;; QUESTION SECTION:
;; jvcelak.fedorapeople.org. IN A

;; Received 42 B
;; Time 2015-02-03 16:12:33 CET
;; From ::1@53(UDP) in 0.1 ms
; Warning: failed to query server ::1@53(UDP)

% sudo unbound-control list_forwards
. IN forward x.x.x.x

With +cd, the resolution works. And resolution via the upstream resolver
x.x.x.x works as well. The upstream resolver runs BIND 9.9.6-P1.

When I disable the forwarding, the resolution starts to work again:

% sudo unbound-control forward_remove .
ok

% kdig @::1 +short jvcelak.fedorapeople.org
152.19.134.191

Is this a bug in Unbound or is my configuration incorrect?

Best regards!

Jan

Is that an old bind server with the wildcard validation bug? That was
supposed to be tested for in the latest dnssec-trigger using a special
record provided to use by centralnic.

https://bugzilla.redhat.com/show_bug.cgi?id=824219

Paul

No, 9.9.6-P1 is the latest release on the long-term-support branch.

Tony.

Hello again.

I made some additional research...

% kdig @::1 jvcelak.fedorapeople.org
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 54325
% sudo unbound-control list_forwards
. IN forward x.x.x.x

With val-log-level 2, I found the follwing:

info: validation failure <jvcelak.fedorapeople.org. A IN>: no signatures
for <fedorapeople.org. NS IN> from x.x.x.x

I fired up a second Unbound, configured it to perform the resolution
from root, set it up in place of the x.x.x.x, flushed the cache, and the
validation started to work.

After inspecting responses from BIND and Unbound, I belive this is
caused by BIND adding a NS RRs without a RRSIG added into the authority
section of the answer.

Unbound:

% kdig +dnssec @127.0.0.2 jvcelak.fedorapeople.org A
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 802
;; Flags: qr rd ra ad; QUERY: 1; ANSWER: 2; AUTHORITY: 2; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 4096 B; ext-rcode: Unused

;; QUESTION SECTION:
;; jvcelak.fedorapeople.org. IN A

;; ANSWER SECTION:
jvcelak.fedorapeople.org. 3585 IN A 152.19.134.191
jvcelak.fedorapeople.org. 3585 IN RRSIG A 5 2 3600 ...

;; AUTHORITY SECTION:
*.fedorapeople.org. 86385 IN NSEC fedorapeople.org. ...
*.fedorapeople.org. 86385 IN RRSIG NSEC 5 2 86400 ...

;; Received 461 B
;; Time 2015-02-04 01:12:51 CET
;; From 127.0.0.2@53(UDP) in 0.1 ms

BIND:

% kdig +dnssec @x.x.x.x jvcelak.fedorapeople.org A
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 59967
;; Flags: qr rd ra; QUERY: 1; ANSWER: 2; AUTHORITY: 6; ADDITIONAL: 7

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 4096 B; ext-rcode: Unused

;; QUESTION SECTION:
;; jvcelak.fedorapeople.org. IN A

;; ANSWER SECTION:
jvcelak.fedorapeople.org. 3600 IN A 152.19.134.191
jvcelak.fedorapeople.org. 3600 IN RRSIG A 5 2 3600 ...

;; AUTHORITY SECTION:
*.fedorapeople.org. 3600 IN NSEC fedorapeople.org. ...
*.fedorapeople.org. 3600 IN RRSIG NSEC 5 2 86400 ...
fedorapeople.org. 33297 IN NS ns02.fedoraproject.org.
...

;; ADDITIONAL SECTION:
ns02.fedoraproject.org. 48697 IN A 152.19.134.139
ns02.fedoraproject.org. 48697 IN AAAA ...
...

;; Received 674 B
;; Time 2015-02-04 01:11:12 CET
;; From x.x.x.x@53(UDP) in 93.0 ms

I don't know why BIND is adding the NS into the answer. But I think this
is really a problem of BIND, as per
http://tools.ietf.org/html/rfc4035#section-3.1.1:

   o When placing a signed RRset in the Authority section, the name
      server MUST also place its RRSIG RRs in the Authority section.
      The RRSIG RRs have a higher priority for inclusion than any other
      RRsets that may have to be included. If space does not permit
      inclusion of these RRSIG RRs, the name server MUST set the TC bit.

Please, can somebody confirm that my assumptions are right?

Thanks and regards,

Jan

I would expect unbound to just clean/ignore any additional data that comes
without RRSIG. If not, that would be a bug.

note that my old bind97 I have running on an old nameserver also returns
data without the AD bit set. But I think 9.7 is known to have some
issues with wildcards and CNAMEs.

Paul

I think you are right it is a bug in BIND. I also think Unbound should
discard the incomplete RRset rather than failing to return a response.

It looks like the bug in BIND is due to a combination of an unsigned NS
RRset that came from a referral, and validation turned off. I can't
reproduce the bug with my validating resolvers with a normal query but it
does occur if I set the CD bit.

Are you going to send this in to bind9-bugs@isc.org or would you like me
to do it?

Tony.

It looks like the bug in BIND is due to a combination of an unsigned NS
RRset that came from a referral, and validation turned off. I can't
reproduce the bug with my validating resolvers with a normal query but it
does occur if I set the CD bit.

I don't have access to the BIND server, so I don't know how exactly the server
is configured and which patches are applied. I know just what version.bind
TXT/CH reports.

The server performs validation, but DLV seems to be disabled. I get SERVFAIL
for incorrectly signed domains. But AD flag is cleared for fedorapeople.org.

I have also noticed something else: If I explicitly ask BIND for the NS
records with +dnssec, the server starts putting the missing NS RRSIG into the
subsequent queries for jvcelak.fedorapeople.org.

So if NS RRSIG is in BINDs cache, then validation via Unbound works.

Are you going to send this in to bind9-bugs@isc.org or would you like me
to do it?

I can provide only partial information about the BIND. So if you managed to
reproduce the problem, I would appreciate, if you could send the report. Feel
free to CC me.

As for Unbound, I believe that evaluating the resolution as bogus is too
strict.

Thanks for helping me to find the problem, everyone.

Best regards.

Jan

Hello.

I don't know why BIND is adding the NS into the answer. But I think this
is really a problem of BIND, as per
http://tools.ietf.org/html/rfc4035#section-3.1.1:

   o When placing a signed RRset in the Authority section, the name
      server MUST also place its RRSIG RRs in the Authority section.
      The RRSIG RRs have a higher priority for inclusion than any other
      RRsets that may have to be included. If space does not permit
      inclusion of these RRSIG RRs, the name server MUST set the TC bit.

The BIND developers claim, that the behavior of BIND is correct.

The upstream resolver (BIND) has DLV disabled and therefore uses
a subset of trust anchors my local resolver (Unbound) uses. And the zone
is insecure from the BIND's point of view.

Ignoring the fact, that BIND adds NS records into authority from no
reason, omitting the NS RRSIGs is probably justifiable.

Anyway, it would be great, if Unbound could strip non-essential records
from the response before performing the validation.

Best regards,

Jan.

I think this is another good reason to stop using DLV.

If unbound is updated to drop unsigned authority RRsets, care should
be taken to not drop unsigned SOA RRs. From some nameservers I've
seen replies with signed NSEC/NSEC3 records, and an unsigned SOA.

Unbound correctly designates these as bogus.

I think this is another good reason to stop using DLV.

This is not just a DLV problem: it can occur for any validator which has
trust anchors for parts of the namespace for which its upstream recursive
server does not.

If unbound is updated to drop unsigned authority RRsets, care should
be taken to not drop unsigned SOA RRs. From some nameservers I've
seen replies with signed NSEC/NSEC3 records, and an unsigned SOA.

Unbound correctly designates these as bogus.

Bogosity should be per-RRset not per-answer. (Though in the case of
nonexistent RRsets you may need multiple NSEC/NSEC3 RRsets to prove
nonexistence; in that case bogosity applies to the each RRset individually
and to the proof as a whole. If there is other gubbins in the answer that
does not affect your ability to demonstrate you got a good answer to the
question you asked.)

Tony.

Viktor Dukhovni writes:

Hi,

Viktor Dukhovni writes:

The BIND developers claim, that the behavior of BIND is
correct.

Unbound is fixed for more lenience. It would be good to not make
messages DNSSEC bogus if it can be avoided.

DLV is not the motivation to fix this, but 'trust anchor differences
and validation'.

Best regards,
   Wouter

Thank you for the fix, Wouter. :slight_smile:

Jan