1.3: stub zones broken

Sort of a followup to my previous email about this matter.

Here's an example of debug output on one of my machines.
Querying a 'stub zone' which gets forwarded to a nearby
nsd, while "." is declared as a forward zone to another
unbound server (in DMZ).

# ./unbound-1.3.2 -dvv
[1248298051] unbound[19620:0] notice: Start of unbound 1.3.2.
[1248298051] unbound[19620:0] debug: chdir to /etc/unbound
[1248298051] unbound[19620:0] debug: drop user privileges, run as unbound
[1248298051] unbound[19620:0] debug: switching log to stderr
[1248298051] unbound[19620:0] debug: module config: "validator iterator"
[1248298051] unbound[19620:0] notice: init module 0: validator
[1248298051] unbound[19620:0] notice: init module 1: iterator
[1248298051] unbound[19620:0] debug: target fetch policy for level 0 is 3
[1248298051] unbound[19620:0] debug: target fetch policy for level 1 is 2
[1248298051] unbound[19620:0] debug: target fetch policy for level 2 is 1
[1248298051] unbound[19620:0] debug: target fetch policy for level 3 is 0
[1248298051] unbound[19620:0] debug: target fetch policy for level 4 is 0
[1248298051] unbound[19620:0] info: DelegationPoint<168.192.in-addr.arpa.>: 0 names (0 missing), 1 addrs (0 result, 1 avail)
[1248298051] unbound[19620:0] info: DelegationPoint<tls.msk.ru.>: 0 names (0 missing), 1 addrs (0 result, 1 avail)
[1248298051] unbound[19620:0] debug: Forward zone server list:
[1248298051] unbound[19620:0] info: DelegationPoint<.>: 0 names (0 missing), 2 addrs (0 result, 2 avail)
[1248298051] unbound[19620:0] debug: cache memory msg=33040 rrset=33040 infra=1312 val=41400
[1248298051] unbound[19620:0] info: start of service (unbound 1.3.2).
[1248298055] unbound[19620:0] debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
[1248298055] unbound[19620:0] info: validator operate: query <paltus.tls.msk.ru. A IN>
[1248298055] unbound[19620:0] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
[1248298055] unbound[19620:0] info: resolving <paltus.tls.msk.ru. A IN>
[1248298055] unbound[19620:0] info: priming . IN NS
[1248298055] unbound[19620:0] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
[1248298055] unbound[19620:0] info: iterator operate: query <. NS IN>
[1248298055] unbound[19620:0] info: processQueryTargets: <. NS IN>
[1248298055] unbound[19620:0] info: sending query: <. NS IN>
[1248298055] unbound[19620:0] debug: sending to target: <.> 128.8.10.90#53
[1248298055] unbound[19620:0] debug: cache memory msg=33040 rrset=33040 infra=1528 val=41400
[1248298055] unbound[19620:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
[1248298055] unbound[19620:0] info: iterator operate: query <. NS IN>
[1248298055] unbound[19620:0] info: processQueryTargets: <. NS IN>
[1248298055] unbound[19620:0] info: sending query: <. NS IN>
[1248298055] unbound[19620:0] debug: sending to target: <.> 192.5.5.241#53
[1248298055] unbound[19620:0] debug: cache memory msg=33040 rrset=33040 infra=1744 val=41400
[1248298056] unbound[19620:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
[1248298056] unbound[19620:0] info: iterator operate: query <. NS IN>
[1248298056] unbound[19620:0] info: processQueryTargets: <. NS IN>
[1248298056] unbound[19620:0] info: sending query: <. NS IN>
[1248298056] unbound[19620:0] debug: sending to target: <.> 193.0.14.129#53
[1248298056] unbound[19620:0] debug: cache memory msg=33040 rrset=33040 infra=1960 val=41400
[1248298056] unbound[19620:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
...

This is an internal machine which does not have access to external network,
so none of the queries will succeed. Especially for names like this
(paltus.tls.msk.ru) which does not exist externally.

Here's the config:

forward-zone:
  name: "."
  forward-addr: 192.168.2.18
  forward-addr: 192.168.2.17

stub-zone:
  name: "tls.msk.ru"
  stub-addr: 192.168.1.2

On the other hand, with this same config and this same version,
"other" names are handled correctly:

[1248298485] unbound[19628:0] info: validator operate: query <www.ru. A IN>
[1248298485] unbound[19628:0] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
[1248298485] unbound[19628:0] info: resolving <www.ru. A IN>
[1248298485] unbound[19628:0] info: processQueryTargets: <www.ru. A IN>
[1248298485] unbound[19628:0] info: sending query: <www.ru. A IN>
[1248298485] unbound[19628:0] debug: sending to target: <.> 192.168.2.18#53

It is sending query to one of the forwarders for "." zone.

The same is hapenning with 1.3.1. 1.3.0 works, but not correctly:

# ./unbound-1.3.0 -dvv
[1248298777] unbound[19738:0] notice: Start of unbound 1.3.0.
[1248298777] unbound[19738:0] debug: chdir to /etc/unbound
[1248298777] unbound[19738:0] debug: drop user privileges, run as unbound
[1248298777] unbound[19738:0] debug: switching log to stderr
[1248298777] unbound[19738:0] debug: module config: "validator iterator"
[1248298777] unbound[19738:0] notice: init module 0: validator
[1248298777] unbound[19738:0] notice: init module 1: iterator
[1248298777] unbound[19738:0] debug: target fetch policy for level 0 is 3
[1248298777] unbound[19738:0] debug: target fetch policy for level 1 is 2
[1248298777] unbound[19738:0] debug: target fetch policy for level 2 is 1
[1248298777] unbound[19738:0] debug: target fetch policy for level 3 is 0
[1248298777] unbound[19738:0] debug: target fetch policy for level 4 is 0
[1248298777] unbound[19738:0] info: DelegationPoint<168.192.in-addr.arpa.>: 0 names (0 missing), 1 addrs (0 result, 1 avail)
[1248298777] unbound[19738:0] info: DelegationPoint<corpit.ru.>: 0 names (0 missing), 1 addrs (0 result, 1 avail)
[1248298777] unbound[19738:0] info: DelegationPoint<tls.msk.ru.>: 0 names (0 missing), 1 addrs (0 result, 1 avail)
[1248298777] unbound[19738:0] debug: Forward zone server list:
[1248298777] unbound[19738:0] info: DelegationPoint<.>: 0 names (0 missing), 2 addrs (0 result, 2 avail)
[1248298777] unbound[19738:0] debug: cache memory msg=33040 rrset=33040 infra=1312 val=41400
[1248298777] unbound[19738:0] info: start of service (unbound 1.3.0).
[1248298779] unbound[19738:0] debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
[1248298779] unbound[19738:0] info: validator operate: query <paltus.tls.msk.ru. A IN>
[1248298779] unbound[19738:0] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
[1248298779] unbound[19738:0] info: resolving <paltus.tls.msk.ru. A IN>
[1248298779] unbound[19738:0] info: processQueryTargets: <paltus.tls.msk.ru. A IN>
[1248298779] unbound[19738:0] info: sending query: <paltus.tls.msk.ru. A IN>
[1248298779] unbound[19738:0] debug: sending to target: <.> 192.168.2.17#53

Note it is sending to one of the forwarders for "." zone, not to
the stub-address (192.168.1.2) for the "tls.mks.ru" zone. But since
192.168.2.1[78] also knows where to find that info, I never noticed.

Changing "stub-zone" and "stub-addr" to "forward-zone" and "forward-addr"
fixes both cases.

In my previous emails I were referring to forward zones instead - and
was wrong. Only stub zones does not work, forward zones seems to be
working correctly.

Thanks

/mjt

[Resending to list]

Paul Wouters wrote:

Here's the config:

forward-zone:
name: "."
forward-addr: 192.168.2.18
forward-addr: 192.168.2.17

stub-zone:
name: "tls.msk.ru"
stub-addr: 192.168.1.2

What happens if you add stub-prime: "no" to the last stub-zone entry?
(I know it is the default, this is how I have mine zones configured)

Nothing changes. I posted debug output since startup up to a first
query to this zone -- and the first thing it does after receiving
first query is to prime root (".") zone. After doing this (on another
machine which do have access to outside 'net), it queries the correct
stub nameserver:

[1248334453] unbound[14868:0] info: priming successful for <. NS IN>
[1248334453] unbound[14868:0] debug: iterator[module 1] operate: extstate:module_wait_subquery event:module_event_pass
[1248334453] unbound[14868:0] info: iterator operate: query <paltus.tls.msk.ru. A IN>
[1248334453] unbound[14868:0] info: resolving (init part 2): <paltus.tls.msk.ru. A IN>
...

So it looks like the problem is that it still tries to prime root
zone when (first) resolving a name in stub zone, even if that is
not necessary. And this becomes a problem when unbound (1.3+, 1.2
did not have this issue) does not have access to outside world so
that it can't contact to any of the root nameservers.

2 questions:

1) why it tries to do root zone priming to start with, only with
  stub-zone (as stub-addr, not stub-name) names?

2) why it ignores forward root zone declaration when performing
  such priming?

I just tried to put (fake) root.hints pointing to another unbound
located on DMZ (192.168.2.17 in my original post), after which
priming goes ok and everything works.

Thanks!

/mjt

Hi Michael,

Sort of a followup to my previous email about this matter.

Here's an example of debug output on one of my machines.
Querying a 'stub zone' which gets forwarded to a nearby
nsd, while "." is declared as a forward zone to another
unbound server (in DMZ).

# ./unbound-1.3.2 -dvv
[1248298055] unbound[19620:0] info: priming . IN NS
[1248298056] unbound[19620:0] info: sending query: <. NS IN>
[1248298056] unbound[19620:0] debug: sending to target: <.> 193.0.14.129#53

What you see here is that unbound is trying to prime the root
servers before continuing. Even though it has a more specific stub
anchor, it wants to prime the root first. If that succeeds, it
will then send this query towards your configured stub anchor.

This is an internal machine which does not have access to external network,
so none of the queries will succeed. Especially for names like this
(paltus.tls.msk.ru) which does not exist externally.

So, it cannot access the root, but it still has root-hints there.
When it starts it will attempt to prime the root.
(Prime the root: contact the root servers to get the latest up
to date root-hints, using the root-hints from configuration).

But your machine is internal and cannot access it, so it fails.
Hm. The easiest way would be to provide root-hints to 127.0.0.1
(or better, to 192.168.2.18). Here is how:

Here's the config:

# Keep this one to do the work.
forward-zone:
  name: "."
  forward-addr: 192.168.2.18
  forward-addr: 192.168.2.17

# Keep this one too.
stub-zone:
  name: "tls.msk.ru"
  stub-addr: 192.168.1.2

# Avoid your root-priming but external
# network not accessible problem.
# i.e. this is similar to providing root-hints: "nowhere"
stub-zone:
  name: "."
  stub-prime: no
  stub-addr: 192.168.2.18

The same is hapenning with 1.3.1. 1.3.0 works, but not correctly:

Yes after your last report I fixed the bug in 1.3.1 :slight_smile:

Changing "stub-zone" and "stub-addr" to "forward-zone" and "forward-addr"
fixes both cases.

Yes this affects unbounds decision to prime the root server.

In my previous emails I were referring to forward zones instead - and
was wrong. Only stub zones does not work, forward zones seems to be
working correctly.

Above workaround should work, thanks for the report,

Best regards,
   Wouter

Hi Michael,

I've provided a bugfix in the svn repository, for the upcoming
unbound release, so that it should not longer do this. Thanks
for the bugreport!

Best regards,
   Wouter

W.C.A. Wijngaards wrote:

Hi Michael,

I've provided a bugfix in the svn repository, for the upcoming
unbound release, so that it should not longer do this. Thanks
for the bugreport!

I just verified it again (despite your added testcase for this
issue) -- it works here just fine now. Thank you for the good
work!

/mjt