one of our unbound hosts recently exited, and before it did, it
logged this:
Sep 19 14:25:56 xxxxxxx unbound: [96:4] error: tube msg write failed: Resource temporarily unavailable
Sep 19 14:25:56 xxxxxxx unbound: [96:4] fatal error: could not write stat values over cmd channel
Now, we're periodically polling stats via "unbound-control stats" and
feeding this into collectd, and our collectd hasn't exactly been fully
stable. However, is there a good reason the failure to write the
stats values is considered a fatal error? One would have thought that
it would not be, and that abandoning the output channel would be a
rasonable error recovery mechanism, allowing the main task of unbound
to proceed uninterrupted?
The error is on a pipe between unbound processes (threads). It should
not be out of resources (it might block of course, waiting for them, and
blocking pipes are not a problem for unbound, but this error is like a
pipe randomly breaks up).
The error is on a pipe between unbound processes (threads). It should
not be out of resources (it might block of course, waiting for them, and
blocking pipes are not a problem for unbound, but this error is like a
pipe randomly breaks up).
one of our unbound hosts recently exited, and before it did, it
logged this:
Sep 19 14:25:56 xxxxxxx unbound: [96:4] error: tube msg write failed: Resource temporarily unavailable
Sep 19 14:25:56 xxxxxxx unbound: [96:4] fatal error: could not write stat values over cmd channel
The error is on a pipe between unbound processes (threads). It should
not be out of resources (it might block of course, waiting for them, and
blocking pipes are not a problem for unbound, but this error is like a
pipe randomly breaks up).
This turned out to be caused by us running a too old version of
unbound, version 1.5.4. I've since upgraded to 1.5.9, so this
exact problem should not happen again for us. In-between there,
tube_write_msg() grew a test for EAGAIN (causing a retry) in the
non-blocking case.