Unbound 1.7.1 failing on some kvm servers

I have a number of kvm instances running debian where unbound 1.7.1
fails.

Many of these instances run whichever kernel was current when I first
leased them, and do not support newer kernels.

(Others look on the fs for a kernel to kexec, but not all do.)

Debian of course compiles unbound on a kernel which support
genrandom(2), but many of mine do not.

Unlike 1.6, 1.7 failes on such a machine, calling SIGKILL rather than
reading /dev/urandom.

It looks like getentropy_urandom() only needs CAN_REFERENCE_MAIN
defined, which getentropy_getrandom also needs, but still
getentropy_urandom() is ignored.

Deb's packaging makes no changes to that part of the code.

An strace(8) ends with:

sendto(9, "<29>May 24 20:17:10 unbound: [10"..., 70, MSG_NOSIGNAL, NULL, 0) = 70
getpid() = 10144
getpid() = 10144
getrandom(0x7fff9321c450, 40, 0) = -1 ENOSYS (Function not implemented)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid() = 10144
gettid() = 10144
tgkill(10144, 10144, SIGKILL) = ?

I've started work on an LD_PRELOAD lib to emulate getrandom(2) by
reading from urandom(5). Other than that, does anyone have any thoughts
on why this started breaking with 1.7.1?

-JimC

James Cloos via Unbound-users <unbound-users@unbound.net> writes:

I have a number of kvm instances running debian where unbound 1.7.1
fails.

An LD_PRELOAD lib which implments getentropy(3) via read(3)ing
urandom(4) solved the bug.

Unbound *always* should fall back to urandom(4) when getentropy(3)
results in ENOSYS, even when compiled against a kernel which advertizes
support for getrandom(2).

-JimC

Hi James,

James Cloos via Unbound-users <unbound-users@unbound.net> writes:

I have a number of kvm instances running debian where unbound 1.7.1
fails.

An LD_PRELOAD lib which implments getentropy(3) via read(3)ing
urandom(4) solved the bug.

Unbound *always* should fall back to urandom(4) when getentropy(3)
results in ENOSYS, even when compiled against a kernel which advertizes
support for getrandom(2).

But Unbound does that! It falls back to that when the other results in
ENOSYS.

What could be happening is that configure detects arc4random. If that
is the case, Unbound calls that arc4random. And then this library call
has to call getentropy, and it could be that that could does not
fallback? You can check the configure output for the arc4random check.
Or afterwards in config.log or config.h (HAVE_ARC4RANDOM is defined or
not defined).

If that is not the case, then we'd need to go for having log printout
for debug to see what happens.

Best regards, Wouter

Unbound *always* should fall back to urandom(4) when getentropy(3)
results in ENOSYS, even when compiled against a kernel which advertizes
support for getrandom(2).

But Unbound does that!
It falls back to that when the other results in ENOSYS.

In the strace there is no attempt to open(2) "/dev/urandom".

Once the call to glibc's genentropy(3) fails, it immediately sends a
SIGKILL.

What could be happening is that configure detects arc4random. If that
is the case, Unbound calls that arc4random. And then this library call
has to call getentropy, and it could be that that could does not
fallback?

Indeed, compat/arc4random.c:_rs_stir calls genentropy and does not fall back.

You can check the configure output for the arc4random check.
Or afterwards in config.log or config.h (HAVE_ARC4RANDOM is defined or
not defined).

I don't have the configure output; this is debian's compile.

If that is not the case, then we'd need to go for having log printout
for debug to see what happens.

What config is needed for that? verbosity:5 ? I don't see anything
else documented in unbound.conf(5).

:; /etc/init.d/unbound start
[....] Starting DNS server: unbound[1527535960] unbound[4025:0] debug: increased limit(open files) from 1024 to 4152
[1527535960] unbound[4025:0] debug: creating udp6 socket ::1 53
[1527535960] unbound[4025:0] debug: creating tcp6 socket ::1 53
[1527535960] unbound[4025:0] debug: creating udp4 socket 127.0.0.1 53
[1527535960] unbound[4025:0] debug: creating tcp4 socket 127.0.0.1 53
[1527535960] unbound[4025:0] debug: creating tcp6 socket ::1 8953
[1527535960] unbound[4025:0] debug: creating tcp4 socket 127.0.0.1 8953
[1527535960] unbound[4025:0] debug: setup SSL certificates
[1527535960] unbound[4025:0] debug: switching log to syslog

and then in syslog:

May 28 19:32:40 tea unbound-anchor: /var/lib/unbound/root.key has content
May 28 19:32:40 tea unbound-anchor: success: the anchor is ok
May 28 19:32:40 tea unbound: [4026:0] debug: chdir to /etc/unbound
May 28 19:32:40 tea unbound: [4026:0] debug: drop user privileges, run as unbound
May 28 19:32:40 tea unbound: [4026:0] debug: module config: "subnetcache validator iterator"
May 28 19:32:40 tea unbound: [4026:0] notice: init module 0: subnet
May 28 19:32:40 tea unbound: [4026:0] debug: subnet: option registered (8)
May 28 19:32:40 tea unbound: [4026:0] notice: init module 1: validator
May 28 19:32:40 tea unbound: [4026:0] debug: reading autotrust anchor file /var/lib/unbound/root.key
May 28 19:32:40 tea unbound: [4026:0] info: trust point . : 1
May 28 19:32:40 tea unbound: [4026:0] info: assembled 0 DS and 2 DNSKEYs
May 28 19:32:40 tea unbound: [4026:0] info: DNSKEY:: .#011172800#011IN#011DNSKEY#011257 3 8 AwEAAagAIKlVZrpC6Ia7gEzahOR+9W29euxhJhVVLOyQbSEW0O8gcCjFFVQUTf6v58fLjwBd0YI0EzrAcQqBGCzh/RStIoO8g0NfnfL2MTJRkxoXbfDaUeVPQuYEhg37NZWAJQ9VnMVDxP/VHL496M/QZxkjf5/Efucp2gaDX6RS6CXpoY68LsvPVjR0ZSwzz1apAzvN9dlzEheX7ICJBBtuA6G3LQpzW5hOA2hzCTMjJPJ8LbqF6dsV6DoBQzgul0sGIcGOYl7OyQdXfZ57relSQageu+ipAdTTJ25AsRTAoub8ONGcLmqrAmRLKBP1dfwhYB4N7knNnulqQxA+Uk1ihz0= ;{id = 19036 (ksk), size = 2048b}
May 28 19:32:40 tea unbound: [4026:0] info: DNSKEY:: .#011172800#011IN#011DNSKEY#011257 3 8 AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTOiW1vkIbzxeF3+/4RgWOq7HrxRixHlFlExOLAJr5emLvN7SWXgnLh4+B5xQlNVz8Og8kvArMtNROxVQuCaSnIDdD5LKyWbRd2n9WGe2R8PzgCmr3EgVLrjyBxWezF0jLHwVN8efS3rCj/EWgvIWgb9tarpVUDK/b58Da+sqqls3eNbuv7pr+eoZG+SrDK6nWeL3c6H5Apxz7LjVc1uTIdsIXxuOLYA4/ilBmSVIzuDWfdRUfhHdY6+cn8HFRm+2hM8AnXGXws9555KrUB5qihylGa8subX2Nn6UwNR1AkUTV74bU= ;{id = 20326 (ksk), size = 2048b}
May 28 19:32:40 tea unbound: [4026:0] info: file /var/lib/unbound/root.key
May 28 19:32:40 tea unbound: [4026:0] info: last_queried: 1527535960 Mon May 28 19:32:40 2018
May 28 19:32:40 tea unbound: [4026:0] info: last_success: 1527535960 Mon May 28 19:32:40 2018
May 28 19:32:40 tea unbound: [4026:0] info: next_probe_time: 1527575738 Tue May 29 06:35:38 2018
May 28 19:32:40 tea unbound: [4026:0] info: query_interval: 43200
May 28 19:32:40 tea unbound: [4026:0] info: retry_time: 8640
May 28 19:32:40 tea unbound: [4026:0] info: query_failed: 0
May 28 19:32:40 tea unbound: [4026:0] info: [ VALID ] .#011172800#011IN#011DNSKEY#011257 3 8 AwEAAagAIKlVZrpC6Ia7gEzahOR+9W29euxhJhVVLOyQbSEW0O8gcCjFFVQUTf6v58fLjwBd0YI0EzrAcQqBGCzh/RStIoO8g0NfnfL2MTJRkxoXbfDaUeVPQuYEhg37NZWAJQ9VnMVDxP/VHL496M/QZxkjf5/Efucp2gaDX6RS6CXpoY68LsvPVjR0ZSwzz1apAzvN9dlzEheX7ICJBBtuA6G3LQpzW5hOA2hzCTMjJPJ8LbqF6dsV6DoBQzgul0sGIcGOYl7OyQdXfZ57relSQageu+ipAdTTJ25AsRTAoub8ONGcLmqrAmRLKBP1dfwhYB4N7knNnulqQxA+Uk1ihz0= ;{id = 19036 (ksk), size = 2048b} ;;state:2 ;;pending_count:0 last:Sat Jun 13 03:12:07 2015
May 28 19:32:40 tea unbound: [4026:0] info: [ VALID ] .#011172800#011IN#011DNSKEY#011257 3 8 AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTOiW1vkIbzxeF3+/4RgWOq7HrxRixHlFlExOLAJr5emLvN7SWXgnLh4+B5xQlNVz8Og8kvArMtNROxVQuCaSnIDdD5LKyWbRd2n9WGe2R8PzgCmr3EgVLrjyBxWezF0jLHwVN8efS3rCj/EWgvIWgb9tarpVUDK/b58Da+sqqls3eNbuv7pr+eoZG+SrDK6nWeL3c6H5Apxz7LjVc1uTIdsIXxuOLYA4/ilBmSVIzuDWfdRUfhHdY6+cn8HFRm+2hM8AnXGXws9555KrUB5qihylGa8subX2Nn6UwNR1AkUTV74bU= ;{id = 20326 (ksk), size = 2048b} ;;state:2 ;;pending_count:0 last:Thu Aug 10 22:41:16 2017
May 28 19:32:40 tea unbound: [4026:0] debug: validator nsec3cfg keysz 1024 mxiter 150
May 28 19:32:40 tea unbound: [4026:0] debug: validator nsec3cfg keysz 2048 mxiter 500
May 28 19:32:40 tea unbound: [4026:0] debug: validator nsec3cfg keysz 4096 mxiter 2500
May 28 19:32:40 tea unbound: [4026:0] notice: init module 2: iterator
May 28 19:32:40 tea unbound: [4026:0] debug: target fetch policy for level 0 is 3
May 28 19:32:40 tea unbound: [4026:0] debug: target fetch policy for level 1 is 2
May 28 19:32:40 tea unbound: [4026:0] debug: target fetch policy for level 2 is 1
May 28 19:32:40 tea unbound: [4026:0] debug: target fetch policy for level 3 is 0
May 28 19:32:40 tea unbound: [4026:0] debug: target fetch policy for level 4 is 0
May 28 19:32:40 tea unbound: [4026:0] debug: donotq: 127.0.0.0/8
May 28 19:32:40 tea unbound: [4026:0] debug: donotq: ::1
May 28 19:32:40 tea unbound: [4026:0] debug: EDNS known options:
May 28 19:32:40 tea unbound: [4026:0] debug: Code: Bypass_cache_stage: Aggregate_mesh:
May 28 19:32:40 tea unbound: [4026:0] debug: edns-cli NO YES

and then nothing else.

-JimC

Hi James,

Unbound *always* should fall back to urandom(4) when getentropy(3)
results in ENOSYS, even when compiled against a kernel which advertizes
support for getrandom(2).

> But Unbound does that!
> It falls back to that when the other results in ENOSYS.

In the strace there is no attempt to open(2) "/dev/urandom".

Once the call to glibc's genentropy(3) fails, it immediately sends a
SIGKILL.

But in your strace you have a call to getrandom() that fails.

So, would like to implement the fallback, but where? getrandom?
arc4random? getentropy? The strace says getrandom is called and gets
ENOSYS.

In compat/compat/getentropy_linux.c that would result in a fallback to
getentropy_urandom(). That would not fail, and thus not raise SIGKILL.

So perhaps we are executing different code. What code? Depends on
configure time detection.

> What could be happening is that configure detects arc4random. If that
> is the case, Unbound calls that arc4random. And then this library call
> has to call getentropy, and it could be that that could does not
> fallback?

Indeed, compat/arc4random.c:_rs_stir calls genentropy and does not fall back.

Yes that does that. fall back for some glibc implementation?

> You can check the configure output for the arc4random check.
> Or afterwards in config.log or config.h (HAVE_ARC4RANDOM is defined or
> not defined).

I don't have the configure output; this is debian's compile.

This is why I want to find out what happens, and what happened at
configure time. Not sure, but can you run ltrace, it should intercept
the library call and tell us if unbound is calling a function from glibc
(or some other library perhaps, i.e. misexported internal compat
function from another program). And we'd know more what code in unbound
gets executed?

And then I can find out where the fallback code needs to be? I can also
make a patch for you in _rs_stir ; but you'd need to compile to try it
(and when you're doing that tell what configure at that compile says
about what library functions are available and which not).

Best regards, Wouter

I'll try to recompile the Debian package to catch configure output ...
@James: which Debian Version?

Andreas

Hi Andreas,

I don't have the configure output; this is debian's compile

I'll try to recompile the Debian package to catch configure output ...
@James: which Debian Version?

Here is a patch you can use to fix the problem for him, it calls the
urandom fallback when ENOSYS is returns in _rs_stir().

Hopefully that will alleviate the problems. The configure output that
would match this patch is one where getentropy is provided by the
system, not arc4random, and that getentropy then returns ENOSYS.

Best regards, Wouter

(attachments)

patch_getentropy_fallback.diff (1.76 KB)

Once the call to glibc's genentropy(3) fails, it immediately sends a
SIGKILL.

But in your strace you have a call to getrandom() that fails.

compat/getentropy_linux.c calls genentropy(3) which calls getrandom(2).

Indeed, compat/arc4random.c:_rs_stir calls genentropy and does not fall back.

Yes that does that. fall back for some glibc implementation?

It is not really glibc-specific. Rather everything is compiled
expecting one kernel version and running on a different kernel.

but can you run ltrace, it should intercept
the library call and tell us if unbound is calling a function from glibc

The ltrace(8) output ends with:

21004 14:53:22.120825 [0x7f91c82d638e] getpid( <unfinished ...>
21004 14:53:22.120981 [0x7f91c641d257] SYS_getpid() = 21004 <0.000053>
21004 14:53:22.121072 [0x7f91c82d638e] <... getpid resumed> ) = 21004 <0.000244>
21004 14:53:22.121142 [0x7f91c82d613a] getentropy(0x7ffcfab88650, 40, 0, -1 <unfinished ...>
21004 14:53:22.121341 [0x7f91c6390f7a] SYS_318(0x7ffcfab88650, 40, 0, 318) = -38 <0.000087>
21004 14:53:22.121480 [0x7f91c82d613a] <... getentropy resumed> ) = 0xffffffff <0.000331>
21004 14:53:22.121555 [0x7f91c82d627a] raise(9, 40, -264, -1 <unfinished ...>
21004 14:53:22.121729 [0x7f91c6723dab] SYS_rt_sigprocmask(0, 0x7ffcfab885b0, 0x7ffcfab88530, 8) = 0 <0.000071>
21004 14:53:22.121843 [0x7f91c6723db4] SYS_getpid() = 21004 <0.000044>
21004 14:53:22.121929 [0x7f91c6723dbe] SYS_gettid(0x520c, 0x7ffcfab885b0, 0x7ffcfab88530, 8) = 0x520c <0.000058>
21004 14:53:22.122039 [0x7f91c6723dc9] SYS_tgkill(0x520c, 0x520c, 9, 8 <no return ...>
21004 14:53:22.123306 [0xffffffffffffffff] +++ killed by SIGKILL +++

-JimC

OK, here are the logs and patched binaries:

https://andreasschulze.de/tmp/

aaa_debian_sid.log
-> buildlog for stock debian sid package

zzz_with_patch_from_wouter.log
-> same build with patch included

hope, that helps somehow...
Andrea

Hi Andreas, James,

I'll try to recompile the Debian package to catch configure output ...
@James: which Debian Version?

OK, here are the logs and patched binaries:

Thanks! Based on the ltrace and the logs I think the patch is correct.

Initially I got confused with the compat getentropy implementation
inside unbound, but that isn't being used here.

Best regards, Wouter

I don't have the configure output; this is debian's compile

I'll try to recompile the Debian package to catch configure output ...
@James: which Debian Version?

I didn't see that until just now; I had replied initially to the copies
I got parallel to the list, and so missed your replies (and the proposed
patch).

For the archives, sid.

-JimC