Ignored IPv6 requests

I switched a low-usage DNS server from NSD 3 to NSD 4 and I see now
that, from time to time, it stops serving IPv6 requests (timeout). The
problem seem to occur randomly, it lasts a few minutes, then cures
itself. Nothing similar for IPv4. (By the way, the -6 option of
-ex-Nagios monitoring plugins is broken, see
<https://github.com/monitoring-plugins/monitoring-plugins/issues/1228&gt;\.\)

It is not a network/L3 issue: when the problem happens, even a dig on
the name server to itself over the local interface fails. Also, with
tcpdump on the name server, I see DNS requests coming in but nothing
coming out.

strace on the daemon does not show a recvmmsg(sa_family=AF_INET6...)
for my address when tcpdump sees the packet coming in.

Could it be because of something else than NSD? At the same moment, I
switched to Linux kernel 3.12.6 on a Linode VPS. Any similar problems
with this setup? (If I have time, I will try with Knot, to compare and
to see if it's the fault of the kernel.)

Hi Stephane,

I switched a low-usage DNS server from NSD 3 to NSD 4 and I see
now that, from time to time, it stops serving IPv6 requests
(timeout). The problem seem to occur randomly, it lasts a few
minutes, then cures itself. Nothing similar for IPv4. (By the way,
the -6 option of -ex-Nagios monitoring plugins is broken, see
<https://github.com/monitoring-plugins/monitoring-plugins/issues/1228&gt;\.\)

It is not a network/L3 issue: when the problem happens, even a dig
on the name server to itself over the local interface fails. Also,
with tcpdump on the name server, I see DNS requests coming in but
nothing coming out.

I think this (could be) a bug in recvmmsg. This is a (relatively) new
syscall that NSD4 uses to get more performance. On Linux, new recent
kernels. However, it shows to have issues.

You can compile --disable-recvmmsg. Then it uses good-old recvfrom,
and that should work.

strace on the daemon does not show a
recvmmsg(sa_family=AF_INET6...) for my address when tcpdump sees
the packet coming in.

Does the strace also show a result for that recvmmsg call, i.e. -1
errno=.. or so? I need some sort of way to determine if this bug
happens, otherwise I'd have to switch of using recvmmsg completely ...
At least no longer ship it by default, or perhaps perform a linux
kernel-version check in configure (but that gives binaries that then
later do not work when run on another machine).

Could it be because of something else than NSD? At the same moment,
I switched to Linux kernel 3.12.6 on a Linode VPS. Any similar
problems with this setup? (If I have time, I will try with Knot, to
compare and to see if it's the fault of the kernel.)

It'll be recvmmsg. Not sure if Knot uses that. The kernel version
may or may not have fixes for it, I guess, so the kernel version could
be important here.

Best regards,
   Wouter

a message of 71 lines which said:

You can compile --disable-recvmmsg. Then it uses good-old recvfrom,
and that should work.

Indeed. I used the Debian package (which uses the default, recvmmsg),
but recompiled a custom version with --disable-recvmmsg and there are
no more problems. See the attached graph: frequent problems before the
recompilation yesterday afternoon, and a big green area since.

(attachments)

a message of 546 lines which said:

Indeed. I used the Debian package (which uses the default, recvmmsg),
but recompiled a custom version with --disable-recvmmsg and there are
no more problems. See the attached graph: frequent problems before the
recompilation yesterday afternoon, and a big green area since.

What are the plans? Should I file a bug report against the Debian
package, asking them to build with --disable-recvmmsg?

Sidenote: a serious security problem in recvmmsg:
<http://pastebin.com/DH3Lbg54&gt;

Hi Stephane,

> Could it be because of something else than NSD? At the same moment,
> I switched to Linux kernel 3.12.6 on a Linode VPS. Any similar
> problems with this setup? (If I have time, I will try with Knot, to
> compare and to see if it's the fault of the kernel.)

It'll be recvmmsg. Not sure if Knot uses that. The kernel version
may or may not have fixes for it, I guess, so the kernel version could
be important here.

Knot DNS uses recvmmsg as well on Linux kernels. And we would be
interested in hearing the results of this test.

perhaps perform a linux kernel-version check in configure

That's futile – you would have to do the run time check for kernel
version and switch the implementation on run time.

Ondrej

Hi Ondrej,

Could it be because of something else than NSD? At the same
moment, I switched to Linux kernel 3.12.6 on a Linode VPS. Any
similar problems with this setup? (If I have time, I will try
with Knot, to compare and to see if it's the fault of the
kernel.)

It'll be recvmmsg. Not sure if Knot uses that. The kernel
version may or may not have fixes for it, I guess, so the kernel
version could be important here.

Knot DNS uses recvmmsg as well on Linux kernels. And we would be
interested in hearing the results of this test.

I also saw bug reports from NetBSD (which has recvmmsg in recent/devel
kernels?). It simply failed or looped forever.

perhaps perform a linux kernel-version check in configure

That's futile – you would have to do the run time check for kernel
version and switch the implementation on run time.

Yes, for me, today, that means we disable it by default and leave it
until it is fixed in kernels. Good replies are more important than
that handful of percents in performance...

Best regards,
   Wouter