SERVFAIL on available servers

I have a permanent VPN between a couple sites which is not entirely reliable, and unbound is configured with a stub zone pointing to name servers within 192.168/16 space.

The zone is defined in my unbound.conf as: example.com. IN stub noprime: 192.168.182.1

After the VPN has been interrupted, I see SERVFAIL from unbound for all queries, despite the fact that the VPN is now available and I can query the DNS servers across the VPN directly. If I wait, it will resolve itself eventually. Restarting unbound resolves the problem immediately, so I think it's a case of unbound caching that the NS are unresponsive and not trying again.

How do I confirm the problem and/or what can I do to encourage unbound to try again? Or is there a way to tell unbound to always consider the NS responsible for this zone to be available?

What libreswan/openswan does is when the VPN connection goes up or down,
it will signal unbound to flush the cache for that domain. That also
helps for domains that look different internal from external.

So the easy fix for you is on VPN up/down to run:

   unbound-control flush_zone example.com
   unbound-control flush_requestlist

Paul

I have a permanent VPN between a couple sites which is not entirely
reliable, and unbound is configured with a stub zone pointing to
name servers within 192.168/16 space.

The zone is defined in my unbound.conf as: example.com. IN stub
noprime: 192.168.182.1

After the VPN has been interrupted, I see SERVFAIL from unbound for
all queries, despite the fact that the VPN is now available and I
can query the DNS servers across the VPN directly. If I wait, it
will resolve itself eventually. Restarting unbound resolves the
problem immediately, so I think it's a case of unbound caching that
the NS are unresponsive and not trying again.

How do I confirm the problem and/or what can I do to encourage
unbound to try again? Or is there a way to tell unbound to always
consider the NS responsible for this zone to be available?

Hi Dave,

So there are a few things and possible solutions I can tell you about:

- Yes, Unbound is probably caching that it couldn't connect to the nameserver
  and thus when you ask it again shortly after your VPN is down, it will just
  respond with: this domain is no good

- Unbound has a commandline tool called: unbound-control which allows you to
  ask Unbound for all kinds of information or ask it to do something.

- For example you can ask Unbound what it thinks about your domain:
  unbound-control lookup example.com

- Or ask Unbound to show you what it remembered about certain servers:
  unbound-control dump_infra

- Or ask Unbound to forgot what it knows about a certain domain:
  unbound-control flush_zone example.com

- Now if your VPN software supports it, you might be able to configure that software
  to automatically flush the information about your example.com zone when it is connected.

- For example OpenVPN has an 'up' script option when it is connected again, which can call
  a script with this line: unbound-control flush_zone example.com

Hope this is helpful to you.

Have a good day,
Leen.