Nsd 4.0 EAGAIN loop in sendmmsg(2)

On NetBSD 6.99.28-CURRENT, nsd 3.2.16 works fine, however nsd 4.0.0 is spinning chewing CPU. The logs show:

Nov 28 23:07:00 xxx nsd[466]: sendmmsg failed: Resource temporarily unavailable

ktruss shows it getting EAGAIN from sendmmsg(2) over and over again. According to the man page:

[EAGAIN|EWOULDBLOCK]
The socket is marked non-blocking and the requested
operation would block.

Sounds like it’s blocking when it shouldn’t?

-Dustin

Hi Dustin,

The NetBSD sendmmsg implementation must be completely new. Does the
kernel really implement this system call, kernel bugs, or does EAGAIN
really mean 'not yet implemented'? NSD simply calls sendmmsg, looks
how many packets have been sent, and then tries again to send the
remaining packets from the list. The OS could (conceivably) return 1
every time and simply do its one-packet-send function every time, and
never return EAGAIN (sendto does not give NSD EAGAIN when called
packet-by-packet).

The configure option --disable-recvmmsg can be used to disable the
sendmmsg and recvmmsg calls at compile time.

I thought NSD implements the sendmmsg calls 'right', but if somehow it
is wrong, I would like to fix that.

Best regards, Wouter

It looks like it was added to the -current kernel back on 2012/06/22. It could definitely be a bug in the implementation.

–disable-recvmmsg seems to make it happy again.

ktruss for the original binary shows:

2538 1 nsd __clock_gettime50(0x3, 0x7f7fffffd6c0) = 0
2538 1 nsd emul(netbsd)
2538 1 nsd __gettimeofday50(0x7f7fffffd6d0, 0) = 0
2538 1 nsd recvmmsg(0x3, 0x68cb80, 0x64, 0, 0) = 1
“\M-Q?^A\0\0^A\0\0\0\0\0\0\axxxxxxx^Cyyy^Cnet\0\0^A\0^A”
2538 1 nsd sendmmsg(0x3, 0x68cb80, 0x1, 0) Err#35 EAGAIN
2538 1 nsd issetugid() = 1
2538 1 nsd open(“/usr/share/nls/nls.alias.db”, 0x400000, 0) Err#2 ENOENT
2538 1 nsd open(“/usr/share/nls/nls.alias”, 0x400000, 0) = 8
2538 1 nsd __fstat50(0x8, 0x7f7fffffcc70) = 0
2538 1 nsd mmap(0, 0x5f0, 0x1, 0x2, 0x8, 0, 0) = 0x7f7ff7f65000
2538 1 nsd close(0x8) = 0
2538 1 nsd munmap(0x7f7ff7f65000, 0x5f0) = 0
2538 1 nsd open(“/usr/share/nls/C/libc.cat”, 0, 0x3a) = 8
2538 1 nsd __fstat50(0x8, 0x7f7fffffccf0) = 0
2538 1 nsd mmap(0, 0x10be, 0x1, 0x1, 0x8, 0, 0) = 0x7f7ff7f64000
2538 1 nsd close(0x8) = 0
2538 1 nsd munmap(0x7f7ff7f64000, 0x10be) = 0
2538 1 nsd __gettimeofday50(0x7f7fffffc190, 0) = 0
2538 1 nsd getpid() = 2538, 13362
2538 1 nsd fcntl(0xa, 0x3, 0) = 2
2538 1 nsd sendto(0xa, 0x7f7fffffc1b0, 0x75, 0, 0, 0) = 117
“<27>1 2013-12-03T03:42:32.309243+00:00 xxxxxxx.yyy.net nsd 2538 - - s”

2538 1 nsd getpid() = 2538, 13362
2538 1 nsd __gettimeofday50(0x7f7fffffd320, 0) = 0
2538 1 nsd write(0x2, 0x7f7fffffca10, 0x50) = 80
"[1386042152] nsd[2538]: error: sendmmsg failed: Resource temporarily "
2538 1 nsd write(0x2, 0x652bc7, 0x1) = 1
“\n”

2538 1 nsd __clock_gettime50(0x3, 0x7f7fffffd6c0) = 0
2538 1 nsd __gettimeofday50(0x7f7fffffd6d0, 0) = 0
2538 1 nsd __kevent50(0x7, 0x7f7ff7315000, 0, 0x7f7ff7316000, 0x40, 0) = 1
2538 1 nsd __clock_gettime50(0x3, 0x7f7fffffd6c0) = 0
2538 1 nsd __gettimeofday50(0x7f7fffffd6d0, 0) = 0
2538 1 nsd recvmmsg(0x3, 0x68cb80, 0x64, 0, 0) Err#35 EAGAIN
2538 1 nsd __clock_gettime50(0x3, 0x7f7fffffd6c0) = 0
2538 1 nsd __gettimeofday50(0x7f7fffffd6d0, 0) = 0
2538 1 nsd __kevent50(0x7, 0x7f7ff7315000, 0, 0x7f7ff7316000, 0x40, 0) = 1
2538 1 nsd __clock_gettime50(0x3, 0x7f7fffffd6c0) = 0
2538 1 nsd __gettimeofday50(0x7f7fffffd6d0, 0) = 0
2538 1 nsd recvmmsg(0x3, 0x68cb80, 0x64, 0, 0) Err#35 EAGAIN

So the very first recvmmsg works, but then every sendmmsg and recvmmsg afterwards returns EAGAIN and the process goes into a loop.

-Dustin