resolution fails when the date of the server is more than 2 days late

Hello,

I am running unbound on SBC cards (Raspberry Pi, Tinker board, Rock64). These cards don't have a hardware clock. They are set to resolve to 127.0.0.1 so that the local processes use Unbound for DNS requests. Everything fine here.

But when I don't have used a card for more than 1 or 2 days (I didn't test the exact threshold), when I start one, I get in a vicious cycle :

The clock is 2 days or more late
All DNS resolutions fail because of this difference
ntp.org calls fail
The clock is not updated
All DNS resolutions fail
And so on...

The way out of the cycle is to set the date manually, after that everything is fine again. But the user is not expected to do that in a linux command line.

Is there a way to prevent this behaviour from unbound, and get at least ntp.org resolved when there is a serious clock drift ?

I could do that by setting the ip address of ntp.org somewhere, but if this ip address changes one day, the system will fail again, so I don't like it.

There is a ticket here about this question : https://github.com/RefugeeHotspot/RefugeeHotspot/issues/74

If the answer is not obvious, I will provide more details about unbound answers in this situation.

Thanks.

Dysmas

Hi,

if you are using such a configuration I would suggest to go with something like this:

https://thepihut.com/blogs/raspberry-pi-tutorials/17209332-adding-a-real-time-clock-to-your-raspberry-pi

That way you won’t have that Problem any more. Everything else I can think of is not reliable.

Bye

Gesendet von Mail für Windows 10

I could do that by setting the ip address of ntp.org somewhere, but if this
ip address changes one day, the system will fail again, so I don't like it.

From a security perspective, it is important that the device fails safe,
that is it will do the least harmful thing to its affected humans. What
this is, is not readily obvious, because computers of course exist in
so many places and for so many reasons (embedded systems even more so,
of course.), that an analysis must be done specifically for the situation
at hand. Here, it is the loss of trustworthiness in DNSSEC that comes
from querying an unknown resolver. Perhaps it is better to get some
resolution up and take things from there. Perhaps not.

A few ideas, all bad;

Is this the only local hardware you've got? Otherwise, the obvious answer
is to get the NTP server via DHCP option. Needs some adjustment, in
order to actually be used.

With control over services offered locallly, you also can get the NTP
client to listen for LAN broadcast/multicast NTP.

You can wrap the NTP startup script in some hackery that uses a
well-known full-service resolver to jumpstart the process, like so:

NTPIP=`dig ntp.se A +short @1.1.1.1`
grep PLACEHOLDER ntp.conf.input
server PLACEHOLDER
sed -e "s/PLACEHOLDER/${NTPIP}/" < ntp.conf.input > ntp.conf
diff ntp.conf.input ntp.conf
1c1
< server PLACEHOLDER

Is there a way to prevent this behaviour from unbound, and get at least
ntp.org resolved when there is a serious clock drift ?

Hello Dysmas,

according to unbound.conf(5):

       val-override-date: <rrsig-style date spec>
              Default is "" or "0", which disables this debugging feature. If enabled by giving a
              RRSIG style date, that date is used for verifying RRSIG inception and expiration dates,
              instead of the current date. Do not set this unless you are debugging signature incep‐
              tion and expiration. The value -1 ignores the date altogether, useful for some special
              applications.

So I guess the way out is to boot the device with `val-override-date:
-1` and after the clock is synchronized (which should be signalled by
NTP). The other option is to declare NTP server domain as insecure.

I am running unbound on SBC cards (Raspberry Pi, Tinker board, Rock64).
These cards don't have a hardware clock. They are set to resolve to
127.0.0.1 so that the local processes use Unbound for DNS requests.
Everything fine here.

But when I don't have used a card for more than 1 or 2 days (I didn't test
the exact threshold), when I start one, I get in a vicious cycle :

The clock is 2 days or more late
All DNS resolutions fail because of this difference
ntp.org calls fail
The clock is not updated
All DNS resolutions fail
And so on...

The way out of the cycle is to set the date manually, after that everything
is fine again. But the user is not expected to do that in a linux command
line.

Is there a way to prevent this behaviour from unbound, and get at least
ntp.org resolved when there is a serious clock drift ?

Unbound is working exactly as it should.

The NTP daemon could be smarter. During initial bring-up it can set the
CD (check disabled) flag on its DNS queries, initially set the clock, and
then unset the CD flag so that later queries are protected. openntpd in
recent OpenBSD versions does this but it hasn't made it opdnntpd-portable
yet.

I could do that by setting the ip address of ntp.org somewhere, but if this
ip address changes one day, the system will fail again, so I don't like it.

You could set the clock initially, with a "one shot" time fetcher like
ntpdate or tlsdate, from a service with a fixed IP and then use
pool.ntp.org servers in your main NTP software. Hardcoding an IP
address from ntp.pool.org is a bad idea - their servers are more likely
to come and go - but there are other NTP servers on stable addresses.
For example you could use one or more of the addresses behind
time.cloudflare.com or time.google.com.

If you really don't want to hardcode an NTP server IP address, you
could use an external public DNS resolver (9.9.9.9, 1.1.1.1,
8.8.8.8, etc) initially to find the IP address for the "one shot"
fetcher. Either lookup using dig(1) or similar pointed directly at
that server and use the result in the command, or switch out the
resolv.conf file. Or if you prefer to use your own resolver for
this, you could use dig +cdflag.

Have you considered this approach?
https://wiki.mikrotik.com/wiki/Manual:Scripting-examples#Allow_use_of_ntp.org_pool_service_for_NTP

Resolve the DNS records on a regular interval, e.g. once a week or
once a month. And use the resolved IP addresses for hardcoding the NTP
upstream servers?

0.pool.ntp.org
1.pool.ntp.org
2.pool.ntp.org
3.pool.ntp.org

This will ensure you can always sync the NTP clock locally, even if
your recursive resolver fails due to time-drifting. And you will
always have consistent, working IP addresses stored for upstream NTP
synchronization refreshed on regular intervals.

Placing a script file in /etc/cron.weekly or /etc/cron.monthly can do
the task of updating your local ntpd.conf file and
reloading/restarting the ntp daemon.

$ dig A +answer +nocmd +nomultiline +nocomments 0.pool.ntp.org | grep
-v '^;' | grep -E -o
"\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b"
ipv4_n1
ipv4_n2
ipv4_n3
ipv4_n4

$ dig AAAA +answer +nocmd +nomultiline +nocomments 2.pool.ntp.org |
grep -v '^;' | grep -E -o "\b[0-9a-fA-F\:]{7,39}\b"
ipv6_n1
ipv6_n2
ipv6_n3
ipv6_n4

Hi Dysmas,

Hello guys,

thanks a lot for so many answers in such a short time ! They confirm the question is not so simple.

I will just explain here why some of them cannot be used for me.

I am using unbound in an Internet router which will be deployed in several places in different countries. That's why the LAN approach will not work because I cannot guess the LAN configuration of each installation.

To achieve a relatively fail proof system, we have always two routers in each location, one working 24/7, the other just waiting to replace it in case of failure. This second unit may stay on a shelf for years, and once it will be necessary, the battery of the RTC will possibly be dead. This makes the RTC option useless, unless we maintain this unit working all the time, but then it may fail without notice and not be ready the day we need it.

I think, from all your answers, that the good approach for us will be a script, ran some time after boot which would do :

- check the situation with ntpstat
- if synchronisation did not occur since start :
unbound-control -set_option "val-override-date: -1"
- wait a moment and check again
- if OK :
unbound-control -set_option "val-override-date: 0"
and exit
- otherwise loop.

It is not tested yet, but I think the exposition to unsafe situations is really minimal.

Does this justify the resurrection of Joe Abley's idea ? I don't know. You will see.

Thanks a lot to all

Dysmas

$ dig A +answer +nocmd +nomultiline +nocomments 0.pool.ntp.org

Aren't you also obtaining NS and additional records? I haven't checked, but I assume those aren't NTP servers. If my assumption is accurate, this seems simpler:

$ dig A +short 0.pool.ntp.org

  -JP

Dave Knight and I once wrote down some thoughts about the process of
bootstrapping a cold validator onto the network. The draft apparently
didn't seem very interesting to anybody else and wasn't picked up by the
working group, but I think the potential for circular dependencies is worth
documenting.

I'm interested but I have limited spare energy :slight_smile:

Our approach was to specify that validation should be disabled following
boot until an accurate sense of time was acquired, which is what many
people in this thread have suggested.

I think it's better to assume validation will work and only disable it if
the assumption turns out to be wrong. (Computers without RTCs might prefer
to be less optimistic!)

draft-jabley-dnsop-validator-bootstrap-00

I wonder if it's worth having separate recovery processes for getting the
time and getting the trust anchors. If they are separated then you can use
cutting edge cleverness like roughtime, and in most cases you will have
the right trust anchor but not the right time. On the other hand, a
single process is simpler, and if you are getting the trust anchor then
you'll usually get good-enough time for free from the HTTP Date: header
:slight_smile:

My pet idea for solving the bootstrap problem is:

  * Your device is shipped with a list of trust anchor servers, identified
    by URL and TLS SPKI pin, and a quorum size which is less than the
    length of the list.

    The URL could have an IP address in the host part, so bootstrapping
    doesn't depend on the DNS, but that might make trust anchor servers
    too difficult from the ops point of view. On the other hand
    authoritative TLD name servers have very stable IP addresses so it is
    possible...

    The servers should be diverse in terms of operators, locations, etc.
    The size of the quorum allows you to trade off availability and
    security.

  * If DNSSEC doesn't work at boot time, your device runs a bootstrap
    process, as follows:

  * Shuffle the list of trust anchor servers

  * Make an https request to each one, authenticating the server using the
    SPKI pin but ignoring certificate validity times. The response
    includes the time and the current root trust anchor.

  * Keep going until you get matching responses from a quorum of servers.
    You can make requests concurrently so it's reasonably fast. When
    comparing the time (actually, compare the offset from the local clock,
    because the requests don't happen at the same time), you should allow
    for some amount of drift, e.g. tuned to match your NTP client's
    maximum offset.

This process avoids two problems with the validator-bootstrap draft, or
rather with the trust anchor publication draft which it refers to.

(1) It sets up data.iana.org to be a huge single point of failure.

(2) The process for validating the responses from data.iana.org just moves
the trust anchor from the DNS to x.509/pgp/smime. Whereas the DNSSEC trust
anchor has extremely trustworthy and visible processes, the data.iana.org
certificates are a complete mystery, and after the rollover they are
mostly broken.

I know that SPKI pins are now deprecated for web sites, because they are
too brittle. But they are just right for this protocol: the SPKI pin
(subject public key information, i.e. a hash of the server's certificate
public key) authenticates the trust anchor server directly to the device
without relying on a third party.

We aren't just moving trust around, instead we are dispersing it. We don't
trust any individual server (if the quorum is greater than 1), instead we
establish trust from several diverse servers agreeing with each other.

Servers can fail, drop out, etc. and bootstrapping still works. The trust
anchor server list gets updated by the usual OS patching process, but it
should be stable enough that it can still work if it is years out of date:
more like DNS root zone hints than the DNSSEC root trust anchor.

We could add a transparency system that monitors public trust anchor
servers for good behaviour, so that they are trustworthy as well as
trusted.

Tony.

Hello,

I have used Unbound on several machines. I just installed my normal configuration on an APU2 with Debian Buster.

Unbound is unable to open the log file, although it is set to unbound>unbound 777 (for test) : permission denied
It can read the configuration files in /etc/unbound and /etc/unbound/unbound.conf.d, but not elsewhere (in /etc/idefix) : permission denied

The python script is unable to read /proc/net/arp, although any user car read this pseudo file.

Is there any reason inside Unbound which can explain that, or does it surely come from some external component ?

I see apparmor is not configured for any program.

Thanks if you have ideas.

Dysmas

Hi Dysmas,

Maybe the `chroot:` option is set in your configuration file?
If that is so, you would either need to make the files you want to access available inside the chroot, or disable the feature with `chroot: ""`.

Best regards,
-- George