Signed .de zone - temporary validation errors

Hi everyone,

I have a really weird occasional DNSSEC validation error with the DENIC DNSSEC testbed.

My private server, running Debian testing, Unbound 1.4.3-1, libldns1 1.6.4-4, amd64 platform. Used to be the same on Unbound 1.4.0 with ldns 1.6.0, I haven't tested earlier versions. Configuration:

server:
  verbosity: 1
  extended-statistics: yes
  interface-automatic: yes
  dlv-anchor-file: "dlv.isc.org.key"
  trust-anchor-file: "trust-anchor.key"
  val-log-level: 1
remote-control:
  control-enable: yes
stub-zone:
         name: "de"
  stub-addr: 81.91.161.228 # auth-fra.dnssec.denic.de
  stub-addr: 2A02:568:0:1::53
  stub-addr: 87.233.175.25 # auth-ams.dnssec.denic.de
  stub-prime: no

trust-anchor.key is the one from
https://www.secure.denic.de/fileadmin/Domains/DNSSEC/de-trust-anchor.txt .

It occasionally happens after about one to two weeks of uptime that I cannot query any .de domain anymore. All of the sudden the log is full of validation errors

Mar 30 16:29:40 svr01 unbound: [1315:0] info: validation failure <ecm1._domainkey.newsletter.postbank.de. TXT IN>
Mar 30 16:29:43 svr01 unbound: [1315:0] info: validation failure <postbank.de. NS IN>
Mar 30 16:29:43 svr01 unbound: [1315:0] info: validation failure <bounce.newsletter.postbank.de. MX IN>
Mar 30 16:29:43 svr01 unbound: [1315:0] info: validation failure <bounce.newsletter.postbank.de. A IN>

(for all domains in .de). Usually I just restart unbound and the problem goes away. This time I wanted to collect additional information and did not restart the daemon, but the problem went away on its own.

Mar 30 21:20:44 svr01 unbound: [1315:0] info: validation failure <svr02.teleport-iabg.de. A IN>
Mar 30 21:20:44 svr01 unbound: [1315:0] info: validation failure <svr02.teleport-iabg.de. AAAA IN>

and nothing more. Occasionally I also have messages like

Mar 30 21:06:10 svr01 unbound: [1315:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure <de. DNSKEY IN>
Mar 30 21:06:10 svr01 last message repeated 2 times
Mar 30 21:06:10 svr01 unbound: [1315:0] info: failed to prime trust anchor -- could not fetch DNSKEY rrset <de. DNSKEY IN>
Mar 30 21:06:10 svr01 last message repeated 2 times

The process has been running untouched since March 21st.

I raised this on the DENIC ml. Peter Koch told me that he sees queries timeframe. Which would of course mean that Unbound would not get any DNSSEC records, so complaining is a good plan indeed.

Has anyone seen this behaviour before? Is there any particular debug command you want me to run the next time this happens? I am running multiple unbound installations, all of them with DLV, some of them with IANA ITAR, but this is the only one running the signed .de zone.

Best Regards,
Bernhard

Did you check the ntp/clock settings on the machines involved?

You might need to add a lot of verbosity to get more logs out of unbound. Or
if you still have that instance, running, use unbound-remote to dump the cache
to a file and we might be able to get more information out of it.

Paul

It occasionally happens after about one to two weeks of uptime that I
cannot query any .de domain anymore. All of the sudden the log is full
of validation errors

Mar 30 21:06:10 svr01 unbound: [1315:0] info: failed to prime trust
anchor -- DNSKEY rrset is not secure <de. DNSKEY IN>
Mar 30 21:06:10 svr01 last message repeated 2 times
Mar 30 21:06:10 svr01 unbound: [1315:0] info: failed to prime trust
anchor -- could not fetch DNSKEY rrset <de. DNSKEY IN>
Mar 30 21:06:10 svr01 last message repeated 2 times

The process has been running untouched since March 21st.

I raised this on the DENIC ml. Peter Koch told me that he sees queries
from my IP address without the OPT-RR (so no EDNS and no DO) during
that timeframe. Which would of course mean that Unbound would not get
any DNSSEC records, so complaining is a good plan indeed.

Did you check the ntp/clock settings on the machines involved?

Well, ntpd is running, shows no errors and the timestamps in the logfile (see above) are continous and without any (big = >5min ) jumps.

You might need to add a lot of verbosity to get more logs out of
unbound. Or if you still have that instance, running, use
unbound-remote to dump the cache to a file and we might be able to
get more information out of it.

Will do it the next time it happens.

Bernhard

Hi everyone,

I have a really weird occasional DNSSEC validation error with the DENIC DNSSEC testbed.

My private server, running Debian testing, Unbound 1.4.3-1, libldns1 1.6.4-4, amd64 platform. Used to be the same on Unbound 1.4.0 with ldns 1.6.0, I haven't tested earlier versions. Configuration:

server:
  verbosity: 1
  extended-statistics: yes
  interface-automatic: yes
  dlv-anchor-file: "dlv.isc.org.key"
  trust-anchor-file: "trust-anchor.key"
  val-log-level: 1
remote-control:
  control-enable: yes
stub-zone:
       name: "de"
  stub-addr: 81.91.161.228 # auth-fra.dnssec.denic.de
  stub-addr: 2A02:568:0:1::53
  stub-addr: 87.233.175.25 # auth-ams.dnssec.denic.de
  stub-prime: no

That server (81.91.161.228/87.233.175.25) will tell you that the actual nameservers for .de are [cls].de.net. and [afz].nic.de. Subsequently, the resolver asks one of these servers for an answer, and gets an unsigned delegation. Hence the validation failure.

This is how it worked in the java version of unbound.

Roy

Isn't that why stub-prime: no is there (and the reason why this is so hard to do with
bind because it does not have the equivalent feature) ?

        stub-prime: <yes or no>
               This option is by default off. If enabled it performs NS set
               priming, which is similar to root hints, where it starts using
               the list of nameservers currently published by the zone. Thus,
               if the hint list is slightly outdated, the resolver picks up a
               correct list online.

Paul

>>stub-zone:
>> name: "de"
>> stub-addr: 81.91.161.228 # auth-fra.dnssec.denic.de
>> stub-addr: 2A02:568:0:1::53
>> stub-addr: 87.233.175.25 # auth-ams.dnssec.denic.de
>> stub-prime: no
>
>That server (81.91.161.228/87.233.175.25) will tell you that the actual
>nameservers for .de are [cls].de.net. and [afz].nic.de. Subsequently, the
>resolver asks one of these servers for an answer, and gets an unsigned
>delegation. Hence the validation failure.

[...]

Isn't that why stub-prime: no is there (and the reason why this is so hard
to do with
bind because it does not have the equivalent feature) ?

yes, indeed. Unbound works "despite" the apex NS RRSet pointing to the
standard non-DNSSEC aware servers. But occasionally the OPT RR is
missing (the CD bit still set) and thus no RRSIGs are returned. I'm
prepared to set some traffic dumps up next week to get a more complete
picture.

-Peter