Resource temporarily unavailable and query problem with unbound

Hello,
we are using unbound in our environment. I recently looked into the
logfile and found many of those lines:

"""
[1289226893] unbound[5117:1] notice: sendmsg failed: Resource
temporarily unavailable
[1289226893] unbound[5117:1] notice: remote address is xxx.xxx.xxx.xxx
port 53588
"""

What is the meaning of those lines? Is there perhaps a performance
issue?

Some infos about unbound (it is the debian squeeze package):

Version 1.4.6
linked libs: libev 3.9 (it uses epoll), ldns 1.4.0, OpenSSL 0.9.8o 01
Jun 2010
linked modules: validator iterator
configured for x86_64-pc-linux-gnu on Sun Aug 15 22:32:30 UTC 2010 with
options: '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=
${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=
${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var'
'--libexecdir=${prefix}/lib/unbound' '--disable-maintainer-mode'
'--disable-dependency-tracking' '--disable-rpath'
'--with-chroot-dir=/var/lib/unbound'
'--with-pidfile=/var/run/unbound.pid' '--with-libevent'
BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl

Another problem I have is a "SERVFAIL" with the domain "imgw.pl".

I do this query with dig on our unbound host (with allow_snoop option
set):

$> dig @localhost +trace imgw.pl

and get:
"""
; <<>> DiG 9.6-ESV-R1 <<>> +trace @localhost imgw.pl
; (1 server found)
;; global options: +cmd
. 518399 IN NS a.root-servers.net.
. 518399 IN NS h.root-servers.net.
. 518399 IN NS b.root-servers.net.
. 518399 IN NS d.root-servers.net.
. 518399 IN NS e.root-servers.net.
. 518399 IN NS l.root-servers.net.
. 518399 IN NS f.root-servers.net.
. 518399 IN NS i.root-servers.net.
. 518399 IN NS j.root-servers.net.
. 518399 IN NS g.root-servers.net.
. 518399 IN NS c.root-servers.net.
. 518399 IN NS m.root-servers.net.
. 518399 IN NS k.root-servers.net.
;; Received 500 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms

pl. 172800 IN NS a-dns.pl.
pl. 172800 IN NS c-dns.pl.
pl. 172800 IN NS d-dns.pl.
pl. 172800 IN NS e-dns.pl.
pl. 172800 IN NS f-dns.pl.
pl. 172800 IN NS g-dns.pl.
pl. 172800 IN NS h-dns.pl.
pl. 172800 IN NS i-dns.pl.
;; Received 399 bytes from 193.0.14.129#53(k.root-servers.net) in 6 ms

imgw.pl. 86400 IN NS krgw1.imgw.pl.
imgw.pl. 86400 IN NS wagw1.imgw.pl.
;; Received 97 bytes from 195.80.237.162#53(e-dns.pl) in 40 ms

;; connection timed out; no servers could be reached
Segmentation fault
"""

If I drop the "+trace" option I just receive a "SERVFAIL" answer.

If I do this on my local machine (no unbound):

"""
; <<>> DiG 9.5.1-P2.1 <<>> +trace imgw.pl
;; global options: printcmd
. 84891 IN NS c.root-servers.net.
. 84891 IN NS a.root-servers.net.
. 84891 IN NS g.root-servers.net.
. 84891 IN NS f.root-servers.net.
. 84891 IN NS e.root-servers.net.
. 84891 IN NS l.root-servers.net.
. 84891 IN NS m.root-servers.net.
. 84891 IN NS j.root-servers.net.
. 84891 IN NS b.root-servers.net.
. 84891 IN NS h.root-servers.net.
. 84891 IN NS d.root-servers.net.
. 84891 IN NS i.root-servers.net.
. 84891 IN NS k.root-servers.net.
;; Received 449 bytes from 192.168.2.1#53(192.168.2.1) in 43 ms

pl. 172800 IN NS c-dns.pl.
pl. 172800 IN NS f-dns.pl.
pl. 172800 IN NS e-dns.pl.
pl. 172800 IN NS i-dns.pl.
pl. 172800 IN NS d-dns.pl.
pl. 172800 IN NS h-dns.pl.
pl. 172800 IN NS a-dns.pl.
pl. 172800 IN NS g-dns.pl.
;; Received 397 bytes from 192.33.4.12#53(c.root-servers.net) in 61 ms

imgw.pl. 86400 IN NS krgw1.imgw.pl.
imgw.pl. 86400 IN NS wagw1.imgw.pl.
;; Received 97 bytes from 194.0.1.2#53(h-dns.pl) in 61 ms

imgw.pl. 3600 IN A 195.187.86.14
imgw.pl. 3600 IN A 195.187.86.13
;; Received 71 bytes from 212.160.142.68#53(krgw1.imgw.pl) in 70 ms
"""

Does anybody know why unbound has problems with this domain?

Thanks for any answers...

Best regards,
Nick

Hi Sebiastian,

[1289226893] unbound[5117:1] notice: sendmsg failed: Resource
temporarily unavailable

Are you out of socket-buffers?

Another problem I have is a "SERVFAIL" with the domain "imgw.pl".

The domain runs software that returns malformed packets with EDNS.
Bind 'accepts' them, but unbound does not.

dig @212.160.142.68 imgw.pl +bufsize=1024
;; Warning: Message parser reports malformed message packet.
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63400
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;imgw.pl. IN A

;; ANSWER SECTION:
imgw.pl. 3600 IN A 195.187.86.13
imgw.pl. 3600 IN A 195.187.86.14

;; Query time: 55 msec
;; SERVER: 212.160.142.68#53(212.160.142.68)
;; MSG SIZE rcvd: 71

So, we have to be lenient too, and a fix is in svn r2339 (not in 1.4.7
released today; but in the svn trunk).

Best regards,
   Wouter

Hello Wouter,

Are you out of socket-buffers?

thank you for your answer. I will try to increase the so-rcvbuf on the
system. The system is really heavy in use, so this could be a cause for
those lines.

The domain runs software that returns malformed packets with EDNS.

It's a pity that there is still software which returns malformed
packets. But now I know that it is not unbound's fault.

Thank you...

Best regards,
Sebastian

Hello,
I tried to increase so-rcvbuf to 8m and to 16m. I changed
net.core.rmem_max values in system, too.

I still see many of those lines in the log:

"""
Nov 09 16:34:26 unbound[3333:3] notice: sendmsg failed: Resource
temporarily unavailable
Nov 09 16:34:26 unbound[3333:3] notice: remote address is
xxx.xxx.xxx.xxx port 38777
Nov 09 16:34:26 unbound[3333:5] notice: remote address is
xxx.xxx.xxx.xxx port 45012
Nov 09 16:34:26 unbound[3333:4] notice: sendmsg failed: Resource
temporarily unavailable
Nov 09 16:34:26 unbound[3333:7] notice: sendmsg failed: Resource
temporarily unavailable
"""

Here is my unbound config (defaults not printed):

"""
server:

        so-rcvbuf: 16m
  verbosity: 1
  num-threads: 8
  interface: 0.0.0.0
  interface-automatic: yes

  outgoing-range: 8192
  msg-cache-size: 2150m
  msg-cache-slabs: 8
  num-queries-per-thread: 4096
  rrset-cache-size: 4300m
  rrset-cache-slabs: 8
  infra-cache-slabs: 8

  access-control: 127.0.0.0/8 allow_snoop
       
        <many access-control lines here>

  chroot: ""
  logfile: "/var/log/unbound.log"
        log-time-ascii: yes
  use-syslog: no
  pidfile: "/var/run/unbound.pid"
        do-not-query-address: 192.0.2.0/24
  key-cache-slabs: 8

remote-control:
  control-enable: yes

"""

Here again the linked libs for our unbound 1.4.6:
"""
linked libs: libev 3.9 (it uses epoll), ldns 1.4.0, OpenSSL 0.9.8o
"""

The server has a quadcore CPU (with hyperthreading) and 12GB of RAM.

Are there any ideas what goes wrong?

Best regards,
Sebastian

Hi Sebastian,

Hello,
I tried to increase so-rcvbuf to 8m and to 16m. I changed
net.core.rmem_max values in system, too.

I still see many of those lines in the log:

Usually EAGAIN means the device is busy, but I have never seen it happen
on send. Attached a small patch, could you test this? It sets
SO_SNDBUF to the same value as SO_RCVBUF, perhaps the output buffer is full?

If it gives permission denied, change SO_SNDBUF into SO_SNDBUFFORCE

Otherwise, netstat -su? does it report send buffer overruns (or can it
possibly do so?).

Does a symmetric rcvbuf-sndbuf setting make sense (apart from this
test)? Or do you need to be able to configure the sndbuf? (question
only applicable if attachment solves problem :slight_smile: ).

Best regards,
   Wouter

(attachments)

patch_sndbuf_test.diff (558 Bytes)

Zitat von Sebastian Nickel - Hetzner Online AG <sebastian.nickel@hetzner.de>:

Hello,
I tried to increase so-rcvbuf to 8m and to 16m. I changed
net.core.rmem_max values in system, too.

I still see many of those lines in the log:

"""
Nov 09 16:34:26 unbound[3333:3] notice: sendmsg failed: Resource
temporarily unavailable

I may be totally wrong but should net.core.wmem_max be raised if sendmsg fail?

Regards

Andreas

* Sebastian Nickel:

I tried to increase so-rcvbuf to 8m and to 16m. I changed
net.core.rmem_max values in system, too.

Are there any messages in the kernel log?

Hi,
I tried "netstat -su" and it showed me many UDP send buffer errors. So I
tried to set "net.core.wmem_max" to 8388608 (8m). It did not help.

I googled a bit and found some network tuning guides for linux. I ended
up with the following sysctl.conf:

"""
net.core.rmem_max=8388608
net.core.wmem_max=8388608
net.core.rmem_default=262144
net.core.wmem_default=262144

net.ipv4.udp_mem=8388608 12582912 16777216
net.ipv4.udp_rmem_min=16384
net.ipv4.udp_wmem_min=16384
"""

This solved the problem and I do net get any of those log messages (and
unexpected timeouts) anymore.

One last question:
If I use "so-rcvbuf: 8m" in unbound.conf does this only set the
appropiate kernel variables (like I did with sysctl) or is the value
"8m" used in some calculations inside unbounds code, too?

Thank you for your help...

Best regards,
Sebastian

Zitat von Sebastian Nickel - Hetzner Online AG <sebastian.nickel@hetzner.de>:

Hi,
I tried "netstat -su" and it showed me many UDP send buffer errors. So I
tried to set "net.core.wmem_max" to 8388608 (8m). It did not help.

I googled a bit and found some network tuning guides for linux. I ended
up with the following sysctl.conf:

"""
net.core.rmem_max=8388608
net.core.wmem_max=8388608
net.core.rmem_default=262144
net.core.wmem_default=262144

net.ipv4.udp_mem=8388608 12582912 16777216
net.ipv4.udp_rmem_min=16384
net.ipv4.udp_wmem_min=16384

These are new since kernel 2.6.25, i wonder what values they had before they get tunable. I had suspected that the net.core.* values will apply in case the specific net.ipvX.* values are not present.

May i ask what your traffic numbers are on this resolver?

Regards

Andreas

Hi Sebastian,

I am happy that you found a solution.

I tried "netstat -su" and it showed me many UDP send buffer errors. So I
tried to set "net.core.wmem_max" to 8388608 (8m). It did not help.

So it is the send buffer.

One last question:
If I use "so-rcvbuf: 8m" in unbound.conf does this only set the
appropiate kernel variables (like I did with sysctl) or is the value
"8m" used in some calculations inside unbounds code, too?

No, it only uses the 8m to set a parameter on the socket itself (that
overrides the kernel variable).

Your sysctl values seem to be about setting the send buffer as well, and
perhaps the socket parameter call for the send buffer could be useful;
so that only that socket gets the large send buffer; not all of them (OK
you have 12G :slight_smile: ).

Best regards,
   Wouter

Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sebastian,

I am happy that you found a solution.

I tried "netstat -su" and it showed me many UDP send buffer errors. So I
tried to set "net.core.wmem_max" to 8388608 (8m). It did not help.

So it is the send buffer.

One last question:
If I use "so-rcvbuf: 8m" in unbound.conf does this only set the
appropiate kernel variables (like I did with sysctl) or is the value
"8m" used in some calculations inside unbounds code, too?

No, it only uses the 8m to set a parameter on the socket itself (that
overrides the kernel variable).

Your sysctl values seem to be about setting the send buffer as well, and
perhaps the socket parameter call for the send buffer could be useful;
so that only that socket gets the large send buffer; not all of them (OK
you have 12G :slight_smile: ).

I suspect doubling the "net.core.wmem_default" was the solution but this apply to all sockets without explicit settings, so a so-sndbuf setting would be useful.

Regards

Andreas

Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sebastian,

I am happy that you found a solution.

I tried "netstat -su" and it showed me many UDP send buffer errors. So I
tried to set "net.core.wmem_max" to 8388608 (8m). It did not help.

So it is the send buffer.

One last question:
If I use "so-rcvbuf: 8m" in unbound.conf does this only set the
appropiate kernel variables (like I did with sysctl) or is the value
"8m" used in some calculations inside unbounds code, too?

No, it only uses the 8m to set a parameter on the socket itself (that
overrides the kernel variable).

It does override the default, but must be lower then the *_max setting, no?

Regards

Andreas

Normally, yes, on Linux unbound supports the _FORCE variation, which can
override the *_max (if you are root, therefore it sets this option
before dropping privileges).

I could implement such an option so-sndbuf; would it be easier to just
make it be equal to so-rcvbuf? (saves another config entry?)

Best regards,
   Wouter

Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:

No, it only uses the 8m to set a parameter on the socket itself (that
overrides the kernel variable).

It does override the default, but must be lower then the *_max setting, no?

Normally, yes, on Linux unbound supports the _FORCE variation, which can
override the *_max (if you are root, therefore it sets this option
before dropping privileges).

I could implement such an option so-sndbuf; would it be easier to just
make it be equal to so-rcvbuf? (saves another config entry?)

Not sure. The receive sockets are low in number and tend to higher latency if the machine is under load and could not process the incoming packets fast enough. The send sockets could be much higher in number (outgoing-range?) and should in case of UDP only fall behind if the available bandwidth is saturated.
So i guess the send buffer per socket used should be smaller than the receive buffers...

Regards

Andreas

The so-rcvbuf is only applied to the port-53 sockets for incoming
queries. Like you say the outgoing sockets (due to port randomization)
see less traffic per socket.

Best regards,
   Wouter

Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:

I could implement such an option so-sndbuf; would it be easier to just
make it be equal to so-rcvbuf? (saves another config entry?)

Not sure. The receive sockets are low in number and tend to higher
latency if the machine is under load and could not process the incoming
packets fast enough. The send sockets could be much higher in number
(outgoing-range?) and should in case of UDP only fall behind if the
available bandwidth is saturated.
So i guess the send buffer per socket used should be smaller than the
receive buffers...

The so-rcvbuf is only applied to the port-53 sockets for incoming
queries. Like you say the outgoing sockets (due to port randomization)
see less traffic per socket.

This would vote for a separate parameter instead of using the same value as for so-rcvbuf, no?

Regards

Andreas

Hello,

May i ask what your traffic numbers are on this resolver?

On this resolver we have about 50-60 MBit only DNS Traffic.

Best regards,
Sebastian

Hi Andreas,

This would vote for a separate parameter instead of using the same value
as for so-rcvbuf, no?

I implemented "so-sndbuf: 4m" as a config option. It is in the svn
trunk version of unbound, r2344. It passes the regression tests, but I
would like to know that it works for you; do you want to try it?
(You have to add the so-sndbuf: xx to your config file to enable the
SO_SNDBUF option; otherwise exactly similar to so-rcvbuf).

Best regards,
   Wouter

Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Andreas,

This would vote for a separate parameter instead of using the same value
as for so-rcvbuf, no?

I implemented "so-sndbuf: 4m" as a config option. It is in the svn
trunk version of unbound, r2344. It passes the regression tests, but I
would like to know that it works for you; do you want to try it?
(You have to add the so-sndbuf: xx to your config file to enable the
SO_SNDBUF option; otherwise exactly similar to so-rcvbuf).

I don't have the traffic to really test it, so it would be useful if the OP could test it in the Hetzner Network.

Regards

Andreas