DoT with unbound

Hi,

we're using unbound (1.9.4) on NetBSD 8.0 to provide DNS-over-TLS
service to a diverse set of customers, at the moment mostly as an
experiment.

What I find is that over time there will pile up a set of TCP
connections in CLOSE_WAIT state, all with 147 bytes stuck in the
receive queue of the socket (according to "netstat -n -f inet").

In the TCP state diagram I'm looking at (the original standard),
the next transition from CLOSE_WAIT is supposed to be caused by
the application doing close(), and that is apparently not
happening. This will cause unbound eventually to "clog up" and
stop responding to DoT queries.

I *think* this is an unbound bug, perhaps specific to portability
to this platform. The question I have for now is where do I
start looking for the cause of this problem? Where is the code
which deals with TCP socket handling inside unbound?

Best regards,

- Håvard

Earlier, I wrote:

we're using unbound (1.9.4) on NetBSD 8.0 to provide DNS-over-TLS
service to a diverse set of customers, at the moment mostly as an
experiment.

What I find is that over time there will pile up a set of TCP
connections in CLOSE_WAIT state, all with 147 bytes stuck in the
receive queue of the socket (according to "netstat -n -f inet").

I've taken a closer look, but not found a solution to this issue.

I'm doing testing with "kdig" from the knot package. What I find
is that it can take an inordinately long time to get a response,
e.g.:

% time kdig @158.38.0.1 +timeout=120 +tls vg.no. a
...
0.000u 0.008s 0:31.80 0.0% 0+0k 0+0io 0pf+0w

31 seconds, in other words, and I've seen response times as long
as 1minute+. The randomness pool on the server host is full, as
is it at the client, so it's not that this is running low.

While it's waiting, I can find:

tcp 337 0 158.38.0.1.853 158.38.62.77.64808 ESTABLISHED

in "netstat -f inet -n | grep 853" output on the server. In
other words: the TCP connection is established, but the server
isn't consuming the data the client has sent. Other clients will
leave a little less data, typically 147 bytes:

tcp 147 0 158.38.0.1.853 158.38.204.xxx.43370 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.36.214.xx.41054 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xxx.49226 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xxx.41550 ESTABLISHED

These will in all probability end up as CLOSE_WAIT sockets,
because the client times out, and by the looks of it, the server
is very slow in cleaning these out as well.

Furthermore, I observe this behaviour both with libevent (2.1.11)
and when using the unbound-internal "mini-event" code.

OpenSSL is the in-tree code from NetBSD, version OpenSSL 1.0.2k
26 Jan 2017.

In the process, I ran across

  https://github.com/libevent/libevent/issues/854

but wedging that into libevent 2.1.11 didn't make any significant
improvement that I could see.

I added some logging at the top of ssl_handshake() in
util/netevent.c, and the latest test I did resulted in:

% time kdig @158.38.0.1 +timeout=120 +tls vg.no. a
...
;; Time 2019-11-24 15:06:21 CET
;; From 158.38.0.1@853(TCP) in 11.1 ms
0.000u 0.007s 1:08.18 0.0% 0+0k 0+0io 0pf+0w

and the corresponding log entry:

Nov 24 15:07:29 oliven unbound: [13367:0] notice: ssl handshake with 158.38.62.77 port 64713

So ... apparently unbound is Very Slow in getting around to
handling the socket and handing it off for SSL handshake. The
reason for this has yet to be found. Hints gratefully accepted.

Regards,

- Håvard

Continuing:

I added logging of successful accept()s in
comm_point_perform_accept(), and this resulted in:

% time kdig @158.38.0.1 +timeout=120 +tls vg.no. a
;; Time 2019-11-24 15:37:26 CET
;; From 158.38.0.1@853(TCP) in 5.2 ms
0.003u 0.003s 0:12.62 0.0% 0+0k 0+0io 0pf+0w

and this in the log:

Nov 24 15:37:39 oliven unbound: [15650:0] notice: accept ok from ip4 158.38.62.77 port 64691 (len 16)
Nov 24 15:37:39 oliven unbound: [15650:0] notice: ssl handshake with ip4 158.38.62.77 port 64691 (len 16)
Nov 24 15:37:39 oliven unbound: [15650:0] notice: ssl handshake with ip4 158.38.62.77 port 64691 (len 16)

So, apparently, unbound is sometimes slow in doing accept() of
new connections. Why?

Regards,

- Håvard

Hm, following up on the saga...

My unbound is listening as follows:

server:
  # Listen on port 853 on both logical and physical interfaces:
  interface: 158.38.0.1@853
  interface: 2001:700:0:ff00::1@853
  interface: 158.38.0.168@853
  interface: 2001:700:0:503::ca53@853

and ends up with typically the following FDs open:

# fstat -n -p 24864
USER CMD PID FD DEV INUM MODE SZ|DV R/W
unbound unbound 24864 root 18,4 46615746 40755 512 r
unbound unbound 24864 wd 18,4 46615746 40755 512 r
unbound unbound 24864 0 18,0 85874 20666 2,2 rw
unbound unbound 24864 1 18,0 85874 20666 2,2 rw
unbound unbound 24864 2 18,0 85874 20666 2,2 rw
unbound unbound 24864 3* crypto 0xfffffe841bcbae70
unbound unbound 24864 4* internet dgram udp 158.38.0.1:853
unbound unbound 24864 5* internet stream tcp 158.38.0.1:853
unbound unbound 24864 6* internet6 dgram udp [2001:700:0:ff00::1]:853
unbound unbound 24864 7* internet6 stream tcp [2001:700:0:ff00::1]:853
unbound unbound 24864 8* internet dgram udp 158.38.0.168:853
unbound unbound 24864 9* internet stream tcp 158.38.0.168:853
unbound unbound 24864 10* internet6 dgram udp [2001:700:0:503::ca53]:853
unbound unbound 24864 11* internet6 stream tcp [2001:700:0:503::ca53]:853
unbound unbound 24864 12* unix stream <-> /var/run/unbound-ctl [creat]
unbound unbound 24864 13* unix dgram <-> /var/run/log [using]

The TCP/158.38.0.1:853 port is the "main" service point (receives the
most queries), fd 5.

I added some verbose logging to handle_select(), and it looks like
when unbound takes inordinately long time to return questions on
158.38.0.1:853 (or doesn't do so within even 2 minutes), I get logging
that fd 5 is *not* part of the read-set passed to select(!)

I would have thought that FDs 4-11 would *always* be included in the
read-set in the select loop, but that doesn't appear to be the case.

This also appears to be when there starts to pile up connections which
are established by the TCP stack but which are not being handled by
unbound (not yet picked up with accept()):

# netstat -f inet -n | grep 853
tcp 147 0 158.38.0.1.853 158.38.204.xxx.33336 CLOSE_WAIT
tcp 165 0 158.38.0.1.853 158.38.204.yyy.44656 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.216.zzz.41239 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xx6.37978 CLOSE_WAIT
tcp 147 0 158.38.0.1.853 158.38.204.zz8.49063 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xx1.37366 ESTABLISHED
tcp 147 0 158.38.0.1.853 128.39.150.aa8.37272 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xx7.51850 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xx9.44854 CLOSE_WAIT
tcp 517 0 158.38.0.1.853 158.38.204.xx1.37362 ESTABLISHED
tcp 517 0 158.38.0.1.853 158.38.204.xx2.46394 ESTABLISHED
...

Looking at the code in util/, there doesn't seem to be any code
outside of mini_event.c which messes with the "reads" set, it's zeroed
by event_init(), used by handle_select(), set by event_add(), and
cleared by event_del(). Except, of course, one may operate with
several "contexts" created by event_init().

So I added some code to event_del() and event_add(), and I see fd 5
being both added and removed, and now last in the log I have 2
minutes+ where it's been deleted, so new connections are not picked up
during that entire duration, which is obviously bad.

What might be causing this?

Regards,

- Håvard

What might be causing this?

I beleive I now have the answer to this: the default settings for
num-threads and num-incoming-tcp in unbound are wildly inappropriate
for a TCP-heavy workload, such as providing a DNS-over-TLS service.

num-threads does not have a documented default value (bug!), but by
the looks of it, it appears that the default is 1, i.e. unthreaded
operation.

num-incoming-tcp has a documented value of "10 per thread". Trying
to exceed 10 simultaneous DNS-over-TLS connections is quite easy in
my case. The documentation says "buffers", but I suspect it should
really say "file descriptors"; a buffer doesn't by necessity map 1:1
to a file descriptor.

Furthermore, I'm casting a little doubt over the tcp-idle-timeout
handling and the back-pressre to free up connections it's supposed
to provide; if it was active when 10 connecions were active (with
the defaults in the config), the idle-timeout should have gone down
to 60ms on 35% free and 0ms on "all slots taken", and that just
didn't match what I observed. I've not delved deeper into why this
didn't appear to kick in.

I've for now bumped # of threads to 10 and number of "buffers" per
thread to 100 (while still using the mini-event select() loop), and
unbound seems to behave much better now.

Best regards,

- Håvard