Unbound exiting on stats write failure?

Hi,

one of our unbound hosts recently exited, and before it did, it
logged this:

  Sep 19 14:25:56 xxxxxxx unbound: [96:4] error: tube msg write failed: Resource temporarily unavailable
  Sep 19 14:25:56 xxxxxxx unbound: [96:4] fatal error: could not write stat values over cmd channel

Now, we're periodically polling stats via "unbound-control stats" and
feeding this into collectd, and our collectd hasn't exactly been fully
stable. However, is there a good reason the failure to write the
stats values is considered a fatal error? One would have thought that
it would not be, and that abandoning the output channel would be a
rasonable error recovery mechanism, allowing the main task of unbound
to proceed uninterrupted?

Regards,

- Håvard

Hi Havard,

The error is on a pipe between unbound processes (threads). It should
not be out of resources (it might block of course, waiting for them, and
blocking pipes are not a problem for unbound, but this error is like a
pipe randomly breaks up).

Are you on OpenBSD? Perhaps upgrade the kernel?

Best regards, Wouter

The error is on a pipe between unbound processes (threads). It should
not be out of resources (it might block of course, waiting for them, and
blocking pipes are not a problem for unbound, but this error is like a
pipe randomly breaks up).

Hm.

Are you on OpenBSD? Perhaps upgrade the kernel?

Nope, on NetBSD 7.0.

Regards,

- Håvard

one of our unbound hosts recently exited, and before it did, it
logged this:

  Sep 19 14:25:56 xxxxxxx unbound: [96:4] error: tube msg write failed: Resource temporarily unavailable
  Sep 19 14:25:56 xxxxxxx unbound: [96:4] fatal error: could not write stat values over cmd channel

The error is on a pipe between unbound processes (threads). It should
not be out of resources (it might block of course, waiting for them, and
blocking pipes are not a problem for unbound, but this error is like a
pipe randomly breaks up).

This turned out to be caused by us running a too old version of
unbound, version 1.5.4. I've since upgraded to 1.5.9, so this
exact problem should not happen again for us. In-between there,
tube_write_msg() grew a test for EAGAIN (causing a retry) in the
non-blocking case.

Regards,

- Håvard