Distinguishing types of SERVFAIL

Thanks to W.C.A Wijngaards for the very helpful reply on my last
question, about DNSSEC, empty responses, and use-caps-for-id. We
discovered a bug in PowerDNS
(https://community.letsencrypt.org/t/caa-servfail-changes/38298/2),
which happily was fixed in the 4.0.4 release in June.

I have another question related to SERVFAIL. Let's Encrypt tries to
provide the most useful error messages possible to its users. My
understanding is that a SERVFAIL response could indicate a variety of
problems, including "DNSSEC validation failed," "a remote resolver
failed," and "Unbound failed." Is there any way for us to distinguish
the DNSSEC validation failure from the other cases, so we can provide
that in a detailed error message to our users?

Thanks,
Jacob

Hi Jacob,

I have another question related to SERVFAIL. Let's Encrypt tries to
provide the most useful error messages possible to its users. My
understanding is that a SERVFAIL response could indicate a variety of
problems, including "DNSSEC validation failed," "a remote resolver
failed," and "Unbound failed." Is there any way for us to distinguish
the DNSSEC validation failure from the other cases, so we can provide
that in a detailed error message to our users?

If you get a SERVFAIL response, you can repeat the query with the CD
(checking disabled) flag set. If you then get a NOERROR response, then
it's reasonable to conclude that DNSSEC validation was the problem.

Regards,
Anand Buddhdev

(Renaming this branch of the thread to reflect the topic)

It's great to hear that PowerDNS found and fixed the bug.
By default, [Xenial] ships with a version of PowerDNS that lags behind
the official 4.0.x
branch: https://packages.ubuntu.com/xenial/pdns-server. This is of
course not uncommon for Linux distributions. And as far as I can tell,
this particular version doesn't even have support for CAA, but I am
not sure whether that would be a good or a bad thing in this
particular situation.

Lack of support for CAA doesn't make a difference. A server that doesn't
understand CAA queries will respond with an empty NOERROR, the same as a
server that understands CAA queries but has no resource records of that
type. The problem comes in with the signing of the response.

Personally, I could probably upgrade to a newer version of PowerDNS
without too much hassle. But if every Ubuntu user needs to do that,
that's going to require a lot of coordination. Has anybody tried
getting Ubuntu to officially backport the bug fix into Xenial?

That's a very good idea. I don't think anyone has; would you like to
lead that effort? I could introduce you to the person who helps Certbot
maintain Ubuntu packages. He might have some ideas about the correct
process to follow.

I’m going to work on this today.

BTW there is ongoing work in IETF to introduce extended error messages
which should provide more information. You can see the proposal here:
https://tools.ietf.org/html/draft-wkumari-dnsop-extended-error

To discuss this please join dnsop mailing list:
https://www.ietf.org/mailman/listinfo/dnsop

Early feedback from people who need additional data to complement
SERVFAIL messages is more than welcome. Please join and tell us!