Unbound 1.4.8 returns sporadic SERVFAIL

Hello,

I wonder if you could help me, please. I'm experiencing Unbound 1.4.8
compiled from source with built-in ldns returning SERVFAIL, although I
don't agree with it doing so :slight_smile: FWIW, I cannot reproduce this when
validating with BIND: I've tried with both versions 9.7.2 and 9.8.0rc1.

The following queries, and their reply codes: (the order of queries
appears to be irrelevant)

        dig @127.0.0.1 +dnssec test.jpmens.org -> ANSWER
        dig @127.0.0.1 +dnssec test.jpmens.org ANY -> ANSWER

        dig @127.0.0.1 +dnssec test.jpmens.org SSHFP -> SERVFAIL

wait approx 10seconds:

        dig @127.0.0.1 +dnssec test.jpmens.org SSHFP -> ANSWER
        dig @127.0.0.1 +dnssec test.jpmens.org A -> SERVFAIL
        dig @127.0.0.1 +dnssec test.jpmens.org SOA -> SERVFAIL

At the time of the SERVFAIL, I see the following output:

        debug: out of query targets -- returning SERVFAIL
        debug: store error response in message cache
        debug: return error response SERVFAIL
        debug: mesh_run: iterator module exit state is module_finished
        debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
        info: validator operate: query <test.jpmens.org. SOA IN>
        debug: validator: nextmodule returned
        debug: cannot validate non-answer, rcode SERVFAIL
        debug: mesh_run: validator module exit state is module_finished
        debug: query took 1.222387 sec
        info: mesh_run: end 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 3 recursion replies sent, 0 replies dropped, 0 states jostled out
        info: average recursion processing time 1.181398 sec
        debug: cache memory msg=310695 rrset=338856 infra=26968 val=309379
        debug: svcd callbacks end
        debug: close of port 43479
        debug: close fd 7

I can reproduce this behavior on Fedora 14 with their packaged Unbound,
also 1.4.8.

Is there something wrong with the zone?

My configuration is

        server:
          verbosity: 1
          access-control: 0.0.0.0/0 allow
          use-syslog: no
          harden-glue: yes
          harden-referral-path: no
          auto-trust-anchor-file: "root.key"
          dlv-anchor-file: "dlv.isc.org.key"
          trust-anchor-file: "uno.aa"
        python:
        remote-control:

I've had to disable `harden-referral-path' because the NS RRset for
jpmens.org isn't yet signed.

Thank you & regards,

        -JP

The following queries, and their reply codes: (the order of queries
appears to be irrelevant)

       dig @127.0.0.1 +dnssec test.jpmens.org -> ANSWER
       dig @127.0.0.1 +dnssec test.jpmens.org ANY -> ANSWER

       dig @127.0.0.1 +dnssec test.jpmens.org SSHFP -> SERVFAIL

       dig @127.0.0.1 +dnssec test.jpmens.org SSHFP -> ANSWER

That worked for me on the first attempt.

;; ANSWER SECTION:
test.jpmens.org. 120 IN SSHFP 2 1 C74B4801FD01A68834FF45BACFA114FC3B0C47AA
test.jpmens.org. 120 IN RRSIG SSHFP 8 3 120 20110303000000 20110217000000 50853 jpmens.org. TBq2RoNNMkRv5bnesvjUIsIVVi/Yv0WAiB5527r2v8G5kGpJcUks/Y54 S3ZMc+Ys35EKE+5aQQ7wplioA3Mv59XZu0jeYecQI+Z4sWT4CJyIag9j vs97WjGfBshG8GvUqMjRpPwfa0ITGvHcCnVwpDudH2G2hsJz6cOecqqZ kbw=

       dig @127.0.0.1 +dnssec test.jpmens.org A -> SERVFAIL
       dig @127.0.0.1 +dnssec test.jpmens.org SOA -> SERVFAIL

Those don't exist? And neither does any NS records?

I've had to disable `harden-referral-path' because the NS RRset for
jpmens.org isn't yet signed.

That should not matter. Hardening just queries multiple name servers for
the same data to make spoofing harder. It does not mandate dnssec.

I think your problem is with your zone?

Paul

Hello Paul,

       dig @127.0.0.1 +dnssec test.jpmens.org A -> SERVFAIL
       dig @127.0.0.1 +dnssec test.jpmens.org SOA -> SERVFAIL

Those don't exist? And neither does any NS records?

The A exists, and BIND returns it. The SOA does not exist, and BIND
returns a NOERROR.

I've had to disable `harden-referral-path' because the NS RRset for
jpmens.org isn't yet signed.

That should not matter. Hardening just queries multiple name servers for
the same data to make spoofing harder. It does not mandate dnssec.

Thanks for the clarification.

I think your problem is with your zone?

I don't think there is a problem with the zone, particularly because
a BIND replies correctly to these queries. If I restart Unbound, It
starts off by also replying correctly. I've just restarted and give it

        dig @127.0.0.1 +dnssec test.jpmens.org a -> NOERROR
        dig @127.0.0.1 +dnssec test.jpmens.org sshfp -> NOERROR
        dig @127.0.0.1 +dnssec test.jpmens.org any -> SERVFAIL !

This is weird. Can it have something to do with the quite low TTL, which
is set to 120 on both A and SSHFP ?

        -JP

yes.
I'm surprised about three RRSIG for one RR:

$ dig @a.six53.net. jpmens.org. ns +dnssec +short
a.six53.net.
b.six53.net.
c.six53.net.
d.six53.net.
NS 8 2 86400 20110303000000 20110217000000 50853 jpmens.org. APF6ZYf+cVySBHVBw+cA0rME4ZlG5r33bBZgtgcl/kEjDZCPqOYDIQj8 b/Zi1lFqL2X2qwI3DKL0VrN2XjDJeESMBdbcaYGygqPxH59cFDS9AX4b mHpJsjC5A5Nl6BA3xpe/Iw30UN7T0ohbEZlgfHTtm/VaMCDZvXyEFzwF JSo=
NS 8 2 86400 20110303000000 20110217000000 50853 jpmens.org. BaFpHw3hi4v64JDpUmm2/TVFUCz0jHHeBOtEc0JJQuo4uYJtOVp9W97e KEVFzhnW1Y93utKXK9qkfZsBmPusHvuYLpQg+4065mOEoyEuaZ95247/ KJArGuHDNwHu/Xc35qvbzcTrcwof6T9yey6SuS0BNh1vMdlcGGATuphW RLo=
NS 8 2 86400 20110303000000 20110217000000 50853 jpmens.org. OUShqrUPiUsTVq4A/jkIaCzyXE+8EfSubpggZsQYJD8ih6Yag9W3PlGV esNLi7XrQWxDbBghL/voFCDE0C2iHgt4K8Y0LXTpfr9lZ9n+soME+KsP w3n0TwgRw4GbE0XxgaVrUF7FZauh3FSebgp782QP6cpLjnAFWkJ1cze/ /ss=

may this be part of the problem ?

Hi Jan-Piet, Andreas,

Tested here, the ANY query triggers a validation attempt of the NS
record. The NS record is bogus. When it finds out the NS record is
bogus, unbound refuses to talk to those nameservers. Therefore is
unable to fetch further data (the SSHFP request) for the zone.

Similar behaviour for the nameserver-glue A, AAAA: if they are bogus
unbound refuses to talk to those nameservers.

This is weird.

yes.
I'm surprised about three RRSIG for one RR:

$ dig @a.six53.net. jpmens.org. ns +dnssec +short
a.six53.net.
b.six53.net.
c.six53.net.
d.six53.net.
NS 8 2 86400 20110303000000 20110217000000 50853 jpmens.org. APF6ZYf+cVySBHVBw+cA0rME4ZlG5r33bBZgtgcl/kEjDZCPqOYDIQj8 b/Zi1lFqL2X2qwI3DKL0VrN2XjDJeESMBdbcaYGygqPxH59cFDS9AX4b mHpJsjC5A5Nl6BA3xpe/Iw30UN7T0ohbEZlgfHTtm/VaMCDZvXyEFzwF JSo=
NS 8 2 86400 20110303000000 20110217000000 50853 jpmens.org. BaFpHw3hi4v64JDpUmm2/TVFUCz0jHHeBOtEc0JJQuo4uYJtOVp9W97e KEVFzhnW1Y93utKXK9qkfZsBmPusHvuYLpQg+4065mOEoyEuaZ95247/ KJArGuHDNwHu/Xc35qvbzcTrcwof6T9yey6SuS0BNh1vMdlcGGATuphW RLo=
NS 8 2 86400 20110303000000 20110217000000 50853 jpmens.org. OUShqrUPiUsTVq4A/jkIaCzyXE+8EfSubpggZsQYJD8ih6Yag9W3PlGV esNLi7XrQWxDbBghL/voFCDE0C2iHgt4K8Y0LXTpfr9lZ9n+soME+KsP w3n0TwgRw4GbE0XxgaVrUF7FZauh3FSebgp782QP6cpLjnAFWkJ1cze/ /ss=

may this be part of the problem ?

Seems to be so,
Feb 21 12:47:32 unbound[22628:0] info: verify rrset <jpmens.org. NS IN>
Feb 21 12:47:32 unbound[22628:0] debug: verify sig 50853 8
Feb 21 12:47:32 unbound[22628:0] debug: verify: signature mismatch
Feb 21 12:47:32 unbound[22628:0] debug: verify sig 50853 8
Feb 21 12:47:32 unbound[22628:0] debug: verify: signature mismatch
Feb 21 12:47:32 unbound[22628:0] debug: verify sig 50853 8
Feb 21 12:47:32 unbound[22628:0] debug: verify: signature mismatch
Feb 21 12:47:32 unbound[22628:0] debug: rrset failed to verify: no valid
signatures for 1 algorithms

Best regards,
   Wouter

Hello Andreas,

I'm surprised about three RRSIG for one RR:

Good catch! There seems to be something awkward on the signer; it does
indeed produce an inordinate number of RRSIGs if there is more than one
RR in the set. I'm talking to them as we speak.

Thank you for spotting that!

        -JP

Wouter,

The NS record is bogus. When it finds out the NS record is
bogus, unbound refuses to talk to those nameservers.

Paul Wouters was right: the zone content was bad, and Andreas spotted
the cause: multiple RRSIGs on the NS RRset. My pdns signer erroneously
created them, but that has just been fixed in r2053.

I thought it was Unbound only, because neither BIND nor [1], [2], or [3]
hinted that something was wrong. That worries me.

Thank you all,

        -JP

[1] http://dnssec-debugger.verisignlabs.com/
[2] http://dnsviz.net/
[3] http://dnscheck.iis.se/