Errors in log

Hello, i have installed unbound dns server on server with ultra spark
III processors. After it has been running in log i see errors like
this :

[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:1e] error: event_add failed. in cpsl.
[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:5] error: event_add failed. in cpsl.
[1214559878] unbound[14616:1e] error: event_add failed. in cpsl.
[1214559878] unbound[14616:19] error: event_add failed. in cpsl.
[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:3b] error: event_add failed. in cpsl.
[1214559878] unbound[14616:1e] error: could not event_del on close
[1214559878] unbound[14616:1e] error: event_add failed. in cpsl.
[1214559878] unbound[14616:2d] error: event_add failed. in cpsl.
[1214559878] unbound[14616:3b] error: event_add failed. in cpsl.
[1214559878] unbound[14616:32] error: event_add failed. in cpsl.
[1214559878] unbound[14616:2e] error: event_add failed. in cpsl.
[1214559878] unbound[14616:32] error: event_add failed. in cpsl.
[1214559878] unbound[14616:2d] error: event_add failed. in cpsl.
[1214559878] unbound[14616:2d] error: event_add failed. in cpsl.
[1214559878] unbound[14616:2e] error: event_add failed. in cpsl.
[1214559878] unbound[14616:3b] error: event_add failed. in cpsl.
[1214559878] unbound[14616:3f] error: event_add failed. in cpsl.
[1214559878] unbound[14616:3f] error: event_add failed. in cpsl.
[1214559878] unbound[14616:32] error: event_add failed. in cpsl.
[1214559878] unbound[14616:2d] error: event_add failed. in cpsl.
[1214559878] unbound[14616:8] error: event_add failed. in cpsl.
[1214559878] unbound[14616:32] error: event_add failed. in cpsl.

And also some users have trouble with dns service.

Hi Anatoliy,

Are you using libevent on Solaris? unbound -h shows the version.

I recommend you try to work without libevent on Solaris, older versions
had bugs that caused the below.

Maybe libevent 1.4.5 (latest source release) works better, or compile
unbound without --with-libevent for a builtin that does not have those
problems.

Best regards,
~ Wouter

Anatoliy Kushner wrote:

ns1:~/unbound-1.0.0> ./unbound -h
usage: unbound [options]
        start unbound daemon DNS resolver.
-h this help
-c file config file to read instead of /usr/local/etc/unbound/unbound.conf
        file format is described in unbound.conf(5).
-d do not fork into the background.
-v verbose (more times to increase verbosity)
Version 1.0.0
libevent mini-event-1.0.0, libldns 1.3.0
BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl

there is no libevent in system.

Hmm, i rebuild unbound without libevent,
now in log too match errors :

[1214846912] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846912] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846912] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:2] error: accept failed: Resource
temporarily unavailable
[1214846914] unbound[4145:1] error: accept failed: Resource
temporarily unavailable
[1214846914] unbound[4145:0] error: accept failed: Resource
temporarily unavailable
[1214846914] unbound[4145:2] info: remote address is 212.154.175.226 port 8955
[1214846914] unbound[4145:0] info: remote address is 212.154.175.226 port 8954
[1214846914] unbound[4145:1] info: remote address is (inet_ntop error) port 0
[1214846915] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846915] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846915] unbound[4145:3] error: accept failed: Resource
temporarily unavailable
[1214846915] unbound[4145:1] error: accept failed: Resource
temporarily unavailable
[1214846915] unbound[4145:3] info: remote address is (inet_ntop error) port 0
[1214846915] unbound[4145:1] info: remote address is (inet_ntop error) port 0
[1214846915] unbound[4145:0] error: accept failed: Resource
temporarily unavailable
[1214846915] unbound[4145:0] info: remote address is 212.154.175.226 port 8954
[1214846915] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846916] unbound[4145:2] error: accept failed: Resource
temporarily unavailable
[1214846916] unbound[4145:1] error: accept failed: Resource
temporarily unavailable
[1214846916] unbound[4145:2] info: remote address is (inet_ntop error) port 0
[1214846916] unbound[4145:1] info: remote address is (inet_ntop error) port 0
[1214846916] unbound[4145:2] error: accept failed: Resource
temporarily unavailable
[1214846916] unbound[4145:3] error: accept failed: Resource
temporarily unavailable
[1214846916] unbound[4145:2] info: remote address is (inet_ntop error) port 0
[1214846916] unbound[4145:3] info: remote address is (inet_ntop error) port 0
[1214846916] unbound[4145:1] error: accept failed: Resource
temporarily unavailable
[1214846916] unbound[4145:3] error: accept failed: Resource
temporarily unavailable
[1214846916] unbound[4145:1] info: remote address is (inet_ntop error) port 0
[1214846916] unbound[4145:3] info: remote address is (inet_ntop error) port 0
[1214846916] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846916] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846917] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846917] unbound[4145:1] error: accept failed: Resource
temporarily unavailable
[1214846917] unbound[4145:2] error: accept failed: Resource
temporarily unavailable
[1214846917] unbound[4145:1] info: remote address is (inet_ntop error) port 0
[1214846917] unbound[4145:2] info: remote address is (inet_ntop error) port 0
[1214846917] unbound[4145:0] error: accept failed: Resource
temporarily unavailable
[1214846917] unbound[4145:0] info: remote address is 212.154.175.226 port 8954
[1214846917] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846917] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846917] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846917] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846918] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846919] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846919] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846919] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846919] unbound[4145:3] error: recvfrom failed: Bad file number

Anatoliy Kushner wrote:

Hmm, i rebuild unbound without libevent,
now in log too match errors :

[1214846912] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846913] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:3] error: recvfrom failed: Bad file number
[1214846914] unbound[4145:2] error: accept failed: Resource
temporarily unavailable
[1214846914] unbound[4145:1] error: accept failed: Resource

Hi Anatoliy,

Could this be a ulimit (open files) problem? what is `ulimit -n` ?

You could try to increase the limit on file descriptors, perhaps it is
running out of them. 1024 (the default for linux, coincidentally) should
be enough.

If that fails, then please tell me what version of Solaris are you using
(opensolaris, 9, 10, .. ?; `uname -a`). Also, run could you run unbound
with verbosity set to the maximum (4) and email me the (compressed)
output directly? Better not send that huge logfile to the mailing list.

Also include your configuration file for unbound, that may help too.

For what it's worth, I have unbound running on Solaris/sparc here just
fine ...

Best regards,
~ Wouter

Hi Anatoliy,

I think I know what is happening, you have 4 threads, very busy, each
one is allowed to use 1024 file descriptors for UDP, that totals to 4096
add in 4x100 incoming and 4x100 outgoing TCP file descriptors (to accept
incoming connections) and the number of file descriptors open is >=
1024. The value of FD_SETSIZE on SunOS 10 is 1024 by default, and that
is the problem (the "event_add failed. in cpsl" error).

So the problem is that select(2) is not capable of handling more than
1024 fds per process. So workarounds for you are:
* use less threads, 3 for example.
* set the outgoing range, incoming tcp, outgoing tcp and so on to
smaller values. 200, 10 and 10 for example.
* compile unbound with (a very recent version of) libevent (or libev).
libevent supports dev-ports and dev-poll (something on solaris akin to
epoll and kqueue). Such backends can easily handle large numbers of file
descriptors. Then you can work with a configuration like this.

I think I can detect the problem and print more sane error messages on
startup and recommend smaller config values or use of libevent.

Thanks for the debug logs :slight_smile:

Edited your email and forwarded to unbound-users because the workarounds
may be useful for others as well.

Best regards,
~ Wouter

Anatoliy Kushner wrote:

I build latest unbound 1.0.1

ulimit -a
open files (-n) 32768

<snip>