Crash on DNS lookup with Unbound in OpenDKIM

Hello,

I’m debugging a crash in OpenDKIM that occurs during DNS lookup with
libunbound. It is difficult to pinpoint because I’m unable to obtain the
backtrace after the crash. Here’s the reproduction:
https://github.com/trusteddomainproject/OpenDKIM/issues/56.

Regarding libunbound I’ve narrowed it down to this: the crash happens
when configuring libunbound with a config file (`ub_ctx_config`) that
contains only the following:

    server:
        auto-trust-anchor-file: "/var/lib/unbound/root.key"

When leaving out the auto-trust-anchor-file setting the crash does not
happen. Note that this setting and the file /var/lib/unbound/root.key
are defaults provided by the unbound package in Debian.

I know it is asking much, but I have no knowledge of libunbound and I
suspect there is a mistake in the client code in OpenDKIM, so if you
could give me a hint in which direction I should investigate further I’d
be very grateful. The Unbound client code in OpenDKIM can be found at:
https://github.com/trusteddomainproject/OpenDKIM/blob/develop/opendkim/opendkim-dns.c

Thank you!

Surprise, it is not actually a crash at all, libunbound simply exits
with exit(1)! Cheeky! Is that intentional? I cannot see an indication of
the error anywhere in syslog or anywhere else. This is unexpected
behaviour from a library to say the least.

I set a breakpoint at fatal_exit() in log.c:

(gdb) bt
#0 fatal_exit (format=format@entry=0x7ffff6e0f060 "could not open autotrust file for writing, %s: %s") at util/log.c:322
#1 0x00007ffff6dde5c0 in autr_write_file (env=<optimized out>, tp=0x7fffe405ed30) at validator/autotrust.c:1189
#2 0x00007ffff6dc09ec in autr_process_prime (qstate=<optimized out>, dnskey_rrset=0x7fffe42b1710, tp=<optimized out>, ve=0x7fffe4049990, env=0x7fffe412d600)
    at validator/autotrust.c:2220
#3 process_prime_response.isra.6 (origin=0x7fffe42b1c70, rcode=<optimized out>, id=<optimized out>, vq=<optimized out>, qstate=<optimized out>)
    at validator/validator.c:2958
#4 val_inform_super (qstate=<optimized out>, id=<optimized out>, super=<optimized out>) at validator/validator.c:3108
#5 0x00007ffff6da4815 in mesh_walk_supers (mesh=mesh@entry=0x7fffe42dfc90, mstate=mstate@entry=0x7fffe42ade40) at services/mesh.c:1185
#6 0x00007ffff6db2f27 in mesh_continue (mesh=mesh@entry=0x7fffe42dfc90, mstate=mstate@entry=0x7fffe42ade40, s=<optimized out>, ev=ev@entry=0x7ffff258c9dc)
    at services/mesh.c:1415
#7 0x00007ffff6db2950 in mesh_run (mesh=0x7fffe42dfc90, mstate=0x7fffe42ade40, ev=<optimized out>, e=0x0) at services/mesh.c:1458
#8 0x00007ffff6dc205d in libworker_handle_service_reply (c=<optimized out>, arg=<optimized out>, error=<optimized out>, reply_info=<optimized out>)
    at libunbound/libworker.c:903
#9 0x00007ffff6d91532 in serviced_callbacks (sq=0x7fffe80151f0, error=0, c=0x7fffe434bcd0, rep=0x7ffff258cca0) at services/outside_network.c:1687
#10 0x00007ffff6d9191b in serviced_udp_callback (c=0x7fffe434bcd0, arg=0x7fffe80151f0, error=<optimized out>, rep=0x7ffff258cca0) at services/outside_network.c:2011
#11 0x00007ffff6d90972 in outnet_udp_cb (c=<optimized out>, arg=0x7fffe4319e00, error=<optimized out>, reply_info=0x7ffff258cca0) at services/outside_network.c:538
#12 0x00007ffff6d8bbb2 in comm_point_udp_callback (fd=16, event=<optimized out>, arg=<optimized out>) at util/netevent.c:704
#13 0x00007ffff61258f8 in ?? () from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6
#14 0x00007ffff612633f in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6
#15 0x00007ffff6d9335a in ub_event_base_dispatch (base=0x7fffe42df7c0) at util/ub_event_pluggable.c:491
#16 comm_base_dispatch (b=<optimized out>) at util/netevent.c:241
#17 0x00007ffff6dccb2d in libworker_dobg.lto_priv.226 (arg=0x7fffe412d5b0) at libunbound/libworker.c:360
#18 0x00007ffff69546db in start_thread (arg=0x7ffff258d700) at pthread_create.c:463
#19 0x00007ffff667d88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I set the file permissions of the autotrust file to 0777, but libunbound
still exits. Is configuring auto-trust-anchor-file simply an absolute
no-no with libunbound? Should one always use trust-anchor-file instead
(it does work with trust-anchor-file instead of auto-trust-anchor-file)?

Cheers,

You’re right of course, right after posting my message I noticed that
the autotrust file and directory is owned by user unbound while opendkim
tries to access it as user opendkim.

So all is well in the end. Frustrating though that I had to set up a
full dev environment on the server with debugger and all, just to find
out about a configuration error. A library just silently shutting down
my program is not ok in my opinion.

Could I have found out about my mistake somehow differently?

Thank you,
David

Hi David,

What about the directory permissions where that file lives?
Libunbound may want to save the updated trust anchors to a new
temporary file in the same directory, and would then need write
access to the directory. The strategy being to rename the
temporary back to the original name afterwards, to get "atomic
file system update"-ish semantics.

You’re right of course, right after posting my message I noticed that
the autotrust file and directory is owned by user unbound while opendkim
tries to access it as user opendkim.

So all is well in the end. Frustrating though that I had to set up a
full dev environment on the server with debugger and all, just to find
out about a configuration error. A library just silently shutting down
my program is not ok in my opinion.

Could I have found out about my mistake somehow differently?

Good that you found it.

It was an option to set debug mode, that is similar to unbound's verbose
logging feature and would have enabled logs on your code, and then you
could have seen what it was doing. Eg. printing the error about no
permissions to create a temporary file for writing the updated trust
anchor information in the same directory as the trust anchor file.

int ub_ctx_debugout(ctx, FILE*);
int ub_ctx_debuglevel(ctx, verbosity);
are the calls, and this is the same as the verbosity: <value> option in
unbound.conf.

The issue is that we want to make really sure we can update the trust
anchor. Otherwise the root key might roll, but your application would
not have noticed the error and not be able to write the updated
information. That means it is going to be left with a non-working
rollover endresult, which would only happen very infrequently. But the
permission check is there all the time so we check it straight away, so
you know it can also work when a root key rollover event happens. That
is why it exits (really meant for the unbound daemon itself). Which
also works to signify the problem for library users.

Best regards, Wouter