Issues with DNSSEC, use-caps-for-id, and empty responses

At Let's Encrypt, we recently started refusing to issue if there is a
failure during CAA lookup, in particular a SERVFAIL. We've received a
handful of reports from users who are hitting these SERVFAILs. The
authoritative resolver software and the root causes seem to be somewhat
different (PowerDNS is one; DNSimple's in-house resolver is another),
but it seems like these only happen for people with DNSSEC enabled. For
everyone reporting we can successfully resolve and validate their A
records, but when querying their CAA records we get a failure to
validate. One of the key differences for the CAA records is that the
response is almost always empty, so it seems like the issue may be
related to signing of empty responses. Additionally, we have
"use-caps-for-id: yes" in our unbound config. For one of the affected
domains, we can validate records when we set "use-caps-for-id: no", but
other domains aren't affected.

Do you know of any issues that would cause validation failures for the
particular combination of DNSSEC, empty responses, and use-caps-for-id: yes?

Here are the threads from our forums:

https://community.letsencrypt.org/t/powerdns-cant-find-why-caa-servfails/38127/46
https://community.letsencrypt.org/t/help-diagnosing-caa-failures-ns1-cyso-nl/38461
https://community.letsencrypt.org/t/dnsimple-caa-servfail/38459

And here is an Unbound config that is pretty close to what we have in
prod (performance tuning removed, and file paths and users tweaked to
run as unprivileged user):

https://github.com/jsha/unboundtest/blob/master/unbound.conf

Thanks,
Jacob

Hi Jacob,

A quick response would be that I have had a string of bug reports, where
other software failed to create correct empty DNSSEC proofs. The DNSSEC
proofs would not be correct for a particular corner case, and that
corner case was hit by their options. caps for id and also the harden
referral path and also the qname-minimisation options affected this.
With qname minimisation a query would be asked, one with an empty
response, that needed to be DNSSEC valid, but did not get asked by other
users. And the other software (online signing?) did not create the
correct response.

use caps for id creates upper and lower case characters, and the (online
signing?) other party perhaps does not downcase before creating the
signatures correctly? Something that you could then easily reproduce
without the caps for id option by asking a query with uppercase
characters mixed into the name.

One reason why empty responses and caps for id could fail together, is
that for an empty response, if that uses NSEC, something to do with the
query name is put in the NSEC rdata, the NSEC next closer name. Perhaps
that is not downcased properly before signing.

Best regards, Wouter