Forward servers: option to log failed requests

Hello,

we're still tracking a "some miniscule amount of queries fail every couple of days" case with unbound, and we think it might be related to unbounds backoff timer. Dumping unbound-control lookup (forwarded-zone) every minute yielded this:

normally looks like
(forward server 1) rtt 27 msec, 0 lost. EDNS 0 probed.
(forward server 2) rtt 23 msec, 0 lost. EDNS 0 probed.

problem timeframe:

Mon Aug 31 00:47:09 CEST 2009
(forward server 1) rtt 800 msec, 1 lost. EDNS 0 probed.
(forward server 2) rtt 31 msec, 0 lost. EDNS 0 probed.

Hi Felix,

we're still tracking a "some miniscule amount of queries fail every
couple of days" case with unbound, and we think it might be related to
unbounds backoff timer. Dumping unbound-control lookup (forwarded-zone)
every minute yielded this:

These servers are both for the same zone, right?
So if server 1 does not work, unbound tries server 2.
In the sample below, it would start out favoring server 2,
which is working fine it says.

Are there statistics from the kernel, network stack, about
packets and dropped packets? Is the answer getting dropped
somewhere else in your network?

W.C.A. Wijngaards wrote:

Hi Felix,

we're still tracking a "some miniscule amount of queries fail every
couple of days" case with unbound, and we think it might be related to
unbounds backoff timer. Dumping unbound-control lookup (forwarded-zone)
every minute yielded this:

These servers are both for the same zone, right?
So if server 1 does not work, unbound tries server 2.
In the sample below, it would start out favoring server 2,
which is working fine it says.

Yes.

Are there statistics from the kernel, network stack, about
packets and dropped packets? Is the answer getting dropped
somewhere else in your network?

I don't see anything out of the ordinary, 30s-averages don't show
anything exceeding 5% interface bandwidth on the path between unbound &
the forward servers. That still means that there could be very short
bursts causing packets to be dropped, however, monitoring latency, it
does not look likely.

Kind regards,

Felix

Can the kernel say things about buffer overruns? Counts of packets?
And compare them to the other system? On a busy system many other
parts can drop packets ...

Best regards,
   Wouter