Hashicorp consul dns API and DNSSEC (newb)

Hi,

I’m kind of stuck with this problem. Hashicorp’s consul doesn’t support DNSSEC and as such, I can’t forward from my main bind instance (DNSSEC enabled) to the consul daemon directly. I can’t turn off DNSSEC in the bind instance either.

Instead, my naive plan is to:

  • Instruct bind to forward requests for the consul domain to unbound. They can use DNSSEC for this step.
  • Once unbound receives the request from bind, instruct unbound to forward it further to consul (no DNSSEC).
  • Retrieve the answer from consul and give it back to bind.

Basically, I want to hide a DNS server (consul) that can’t speak DNSSEC behind unbound.

Is that possible?

Thanks!
Sergei

My guess is that Consul is using a domain which is not properly delegated,
so BIND's validation fails when it tries to follow the delegation.

I'm afraid your plan isn't going to work, for three reasons:

* Unbound also validates so I would expect it to have the same problem.

* DNSSEC is end-to-end not hop-by-hop, so you'll get the same validation
  failure wherever the data comes from.

* DNSSEC is designed to be backwards-compatible, so (if there is a proper
  delegation) BIND should be able to resolve insecure domains hosted on
  Consul.

If my guess is right, there are two ways to fix the problem:

(1) Add a proper delegation to the domain hosted by Consul. If someone
made a bad choice about which domain to use (i.e. not one that is properly
registered) this might not be possible.

(2) Add a negative trust anchor for the domain, to disable DNSSEC
validation just for that domain, using `rndc nta`. This might be a bit
annoying because (as RFC 7646 requires) negative trust anchors have a
limited lifetime. Unbound's `domain-insecure` option is more permanent,
but it probably won't help unless you can replace BIND with Unbound.
BIND 9.14 will have a `validate-except` option which works a similar way.

Tony.

Thank you for the very detailed reply.

Just to be clear, we’re dealing with internal DNS here. No registrars are involved. I don’t know much about registrars/delegations but I think it applies mostly to public domains? Anyway, yes, I’ve looked at NTA and it starts being supported with bind 9.11 — we’re below that. So that’s out. I *was* using the “domain-insecure” option of Unbound hoping that it would help with the Bind -> Unbound -> Consul scenario but since as you’re saying it’s hop-to-hop, that’s not going to be possible.

Idea, can Unbound not “vouch” for (trust) this zone DNSSEC-wise and reply to other DNS servers with correctly signed responses?

One thing that does work is going straight to Unbound, but the question is how to make our internal DNS aware of that and “redirect” the client to it (without listing the Unbound server in /etc/resolv.conf). So is it possible in the DNS world to reply to the client with a “referral” instead of doing anything recursive for it? As in "if you’re looking for “something.consul.", go to this DNS server. I won’t do the recursion for you.”?

Sorry for the silly DNS questions!

Thanks a bunch,
  Sergei

Just to be clear, we’re dealing with internal DNS here. No registrars
are involved. I don’t know much about registrars/delegations but I think
it applies mostly to public domains?

Yes, but the DNS is a coherent global namespace, so for internal DNS you
should use a subdomain of a properly registered public domain, e.g. we use
private.cam.ac.uk.

There's a fundamental conflict between DNSSEC (which proves that
unregistered names do not exist) and internal DNS that squats on fake
domains. There's also a risk of conflict with ICANN's new gTLD programme.
It was common to get away with fake internal domains in the past but those
days are long gone.

It's problematic that Consul's defaults guide its users to make this
mistake. I think it should configure its domain option based on the
Consul server's FQDN, rather than using an unwise fixed value.

Idea, can Unbound not “vouch” for (trust) this zone DNSSEC-wise and
reply to other DNS servers with correctly signed responses?

You can kind of do this, but not with Unbound. You need an authoritative
DNS server that supports DNSSEC, which can sign the internal zone. Then
you need to distribute the trust anchor for your internal zone to all the
validators that need to access it. But your authoritative server is
Consul, which does clever things not including DNSSEC, so this way would
lead to a lot of unpleasant complexity.

One thing that does work is going straight to Unbound, but the question
is how to make our internal DNS aware of that and “redirect” the client
to it (without listing the Unbound server in /etc/resolv.conf). So is it
possible in the DNS world to reply to the client with a “referral”
instead of doing anything recursive for it? As in "if you’re looking for
“something.consul.", go to this DNS server. I won’t do the recursion for
you.”?

That's how the recursive -> authoritative part of the DNS protocol works,
but not the stub -> recursive part. Stub resolvers need to get a complete
answer, they aren't clever enough to follow referrals. But what you might
be able to do is get your client machines to talk direct to Unbound only,
and configure Unbound to forward internal queries to Consul and other
queries to BIND. (I didn't suggest this before because I thought you said
they have to use BIND as their recursive server, but I might have
misunderstood how strong this requirement is.)

Sorry for the silly DNS questions!

No problem, this is a legitimately murky corner, so the questions aren't
silly. Fake internal domains, on the other hand ... :slight_smile:

Tony.

Hi Tony,

It’s problematic that Consul’s defaults guide its users to make this
mistake. I think it should configure its domain option based on the
Consul server’s FQDN, rather than using an unwise fixed value.

Consul does have configuration directives to change its domain. By default it’s anything under consul., but I can make it anything I want. I’m not quite sure how this would help though. To be sure we’re on the same page, a few words of what Consul does in general and its DNS API in particular. Let’s say there’s some service X that is provided by a number of machines. Let’s say also that this service is an Active-Standby service where exactly 1 member serves a function. Let’s designate that member the “active” member. What consul allows you to do is have all these members register with Consul and let Consul choose the active member by issuing a pre-defined health check. It does much more (like store the service’s data in a distributed way), but this will suffice for our discussion. As part of its DNS API, Consul allows queries like this:

active.X.service.consul

which would return the IP of the currently active member of service X. This is incredibly convenient for applications.

Given the dynamic nature of the operation, is DNSSEC even possible in this case no matter what the prefix would be? I thought that with DNSSEC, a zone needs to be resigned each time a change happens? If that’s true, I’m not sure how this would work because the “active” portion of that name can change 10 time a day.

That’s how the recursive → authoritative part of the DNS protocol works,
but not the stub → recursive part. Stub resolvers need to get a complete
answer, they aren’t clever enough to follow referrals. But what you might
be able to do is get your client machines to talk direct to Unbound only,
and configure Unbound to forward internal queries to Consul and other
queries to BIND. (I didn’t suggest this before because I thought you said
they have to use BIND as their recursive server, but I might have
misunderstood how strong this requirement is.)

That could be a decent workaround with the only concern being the possible additional load on these Consul machines. Ideally, they should do only Consul stuff, but maybe this will not add too much.

No problem, this is a legitimately murky corner, so the questions aren’t
silly. Fake internal domains, on the other hand … :slight_smile:

Cool :slight_smile:

Thanks again,
Sergei

Hi Tony,

It's problematic that Consul's defaults guide its users to make this
mistake. I think it should configure its domain option based on the
Consul server's FQDN, rather than using an unwise fixed value.

Consul does have configuration directives to change its domain. By
default it’s anything under /consul., /but I can make it anything I
want. I’m not quite sure how this would help though. To be sure we’re on
the same page, a few words of what Consul does in general and its DNS
API in particular. Let’s say there’s some service /X/ that is provided
by a number of machines. Let’s say also that this service is an
Active-Standby service where exactly 1 member serves a function. Let’s
designate that member the “active” member. What consul allows you to do
is have all these members register with Consul and let Consul choose the
active member by issuing a pre-defined health check. It does much more
(like store the service’s data in a distributed way), but this will
suffice for our discussion. As part of its DNS API, Consul allows
queries like this:

*active.X.service.consul*

The problem is caused by fact that DNS root zone (.) is DNSSEC-signed so
DNSSEC validator in BIND (or anywhere else) can prove that domain
`consul.` is not supposed to exist. This proof contradicts data received
from network and this contradiction is treated as an attack (which is
technically correct and expected).

Recommended configuration is to use domain name like
`consul.internal.example.com.` where `internal.example.com.` is an
existing but insecure zone (i.e. a zone which is not signed using DNSSEC).

Using insecure parent zone will automatically disable DNSSEC validation
on subtree `consul.internal.example.com.` and allow you to do whatever
thick Consult is doing.

I hope it clarifies the problem.

which would return the IP of the currently active member of service X.
This is incredibly convenient for applications.

Given the dynamic nature of the operation, is DNSSEC even possible in
this case no matter what the prefix would be? I thought that with
DNSSEC, a zone needs to be resigned each time a change happens? If
that’s true, I’m not sure how this would work because the “active”
portion of that name can change 10 time a day.

This is not a problem, reasonably modern DNS server can handle thousands
of updates per second including automatic DNSSEC resigning.

Petr Špaček @ CZ.NIC

Hi Peter,

The problem is caused by fact that DNS root zone (.) is DNSSEC-signed so
DNSSEC validator in BIND (or anywhere else) can prove that domain
consul. is not supposed to exist. This proof contradicts data received
from network and this contradiction is treated as an attack (which is
technically correct and expected).

Recommended configuration is to use domain name like
[consul.internal.example.com](http://consul.internal.example.com/). where [internal.example.com](http://internal.example.com/). is an
existing but insecure zone (i.e. a zone which is not signed using DNSSEC).

Using insecure parent zone will automatically disable DNSSEC validation
on subtree [consul.internal.example.com](http://consul.internal.example.com/). and allow you to do whatever
thick Consult is doing.

I hope it clarifies the problem.

Very cool! It does seem to working when I’ve changed the consul domain name to consul.ourdomain.net! I don’t quite understand the logic of this behavior, but I’m glad we have a simple solution for this. This is working even without an intermediate bind server. Forwarding straight from our main dns servers is working. So thank you VERY much, Peter, for this valuable insight.

which would return the IP of the currently active member of service X.
This is incredibly convenient for applications.

Given the dynamic nature of the operation, is DNSSEC even possible in
this case no matter what the prefix would be? I thought that with
DNSSEC, a zone needs to be resigned each time a change happens? If
that’s true, I’m not sure how this would work because the “active”
portion of that name can change 10 time a day.

This is not a problem, reasonably modern DNS server can handle thousands
of updates per second including automatic DNSSEC resigning.

Got it, but it’s not going to be necessary anymore.

Thank you!!
Sergei