1.9.4: TCP queries when some threads are full

Hello,

I have noticed a machine running unbound sometimes
ignoring TCP requests, tested like so on the server:

Hi,

Hello,

I have noticed a machine running unbound sometimes
ignoring TCP requests, tested like so on the server:

# dig +tcp @127.0.0.1 CH TXT hostname.bind +tries=1
[...]
;; connection timed out; no servers could be reached

It will work some times, and then it wont, UDP appears unaffected. The TCP
statistics looks like this:

# unbound-control stats_noreset | grep tcp
thread0.tcpusage=10
thread1.tcpusage=1
thread2.tcpusage=0
thread3.tcpusage=4
thread4.tcpusage=0
thread5.tcpusage=0
thread6.tcpusage=0
thread7.tcpusage=0
thread8.tcpusage=10
thread9.tcpusage=10
thread10.tcpusage=10
thread11.tcpusage=1
thread12.tcpusage=5
thread13.tcpusage=0
thread14.tcpusage=0
thread15.tcpusage=1
total.tcpusage=52

The machine is running with the default "incoming-num-tcp" of 10. So it appears
some of the threads are fully utilized. My first question: Is it

You should probably increase that count to 100 or better 1000.
incoming-num-tcp: 1000
And perhaps also increase the upstream (outgoing-num-tcp) if you have upstream TCP or TLS configured.

This allocates more buffers and that is useful for a server with more clients on it.

The setting is actually similar to a setting for NSD that controls similarly the number of buffers for client streams.

possible that the sometimes failing requests is a result of that request
being dispatched to a "full" thread even when there are unused threads
available?

I don't know depends on the OS dispatching it. I think Unbound stops accepting them for that thread when the thread is full.

My second question is what the expected behaviour of unbound is for TCP
connections that are idling. From unbound.conf(5) I see "tcp-idle-timeout"
defaults to 30000ms, so this tells me a TCP connection being silent for 30
seconds will be dropped but maby this only matters until we have seen an
initial query and will then leave the connection forever?

I tracked down the file descriptor for one of the TCP connections to
unbound, found it was created over 12 hours ago, and then filtered for
traffic for the host and port that was holding the connection with
tcpdump, and not a single packet appeared for the several minutes I was running
it.

When TCP is nearly full it should use an even shorter timeout. And not allow such very long idle connections. That looks like it went wrong.

Best regards, Wouter

You should probably increase that count to 100 or better 1000.
incoming-num-tcp: 1000
And perhaps also increase the upstream (outgoing-num-tcp) if you have
upstream TCP or TLS configured.

Thanks for the input, if you specifically mean "tcp-upstream" and
"tls-upstream" then both of them are the default "no".

This allocates more buffers and that is useful for a server with more
clients on it.

I will increase incoming-num-tcp and see how it turns out. It just
felt wrong to increase the value without reaching out to the list since
the statistics were showing several threads having 0 clients, at least
giving me the impression that there should be plenty of connection slots
available.

When TCP is nearly full it should use an even shorter timeout. And not
allow such very long idle connections. That looks like it went wrong.

Yeah I noticed the time limiting code in setup_tcp_handler(). I was
wondering about the wording in the man page for "tcp-idle-timeout" that
states "The period Unbound will wait for a query on a TCP
connection", which could suggest that the timeout is only looked at for
an initial query, and a bit unclear on what happens if you open a
connection, do a query, and then hold on to the connection.

David Gwynne ran into that as well,
https://nlnetlabs.nl/pipermail/unbound-users/2019-August/011748.html

Back in October, Wouter replied:

You should probably increase that count to 100 or better 1000.
incoming-num-tcp: 1000
And perhaps also increase the upstream (outgoing-num-tcp) if you have
upstream TCP or TLS configured.

This allocates more buffers and that is useful for a server with more
clients on it.

The setting is actually similar to a setting for NSD that controls
similarly the number of buffers for client streams.

Someone directed me to this thread after my recent battle with
unbound for dns-over-tls.

Coming from a Unix background, I must say I'm slightly puzzled by the
use of the term "buffer" here. I think a more appropriate term to use
would be "file descriptor". In my understanding, a buffer is usually
a data structure in the application used to temporarily store data,
possibly coming from a file descriptor (but not necessarily), and
there *is* something called "unbuffered read". So a "buffer" does not
necessarily map 1:1 to "file descriptor", at least not to my ears.
Also, a "buffer" isn't an OS resource, whereas a "file descriptor"
definately *is* an OS resource, and what's important to describe here
is the (potential) OS resource use.

Therefore, I *think* what is being described here is "file
descriptors".

I'm going to suggest a few updates to the unbound.conf man page
via a pull request shortly...

Best regards,

- Håvard

In October, Wouter answered Patrik Lundin:

My second question is what the expected behaviour of unbound is for
TCP
connections that are idling. From unbound.conf(5) I see
"tcp-idle-timeout"
defaults to 30000ms, so this tells me a TCP connection being silent
for 30
seconds will be dropped but maby this only matters until we have seen
an
initial query and will then leave the connection forever?
I tracked down the file descriptor for one of the TCP connections to
unbound, found it was created over 12 hours ago, and then filtered for
traffic for the host and port that was holding the connection with
tcpdump, and not a single packet appeared for the several minutes I
was running
it.

When TCP is nearly full it should use an even shorter timeout. And
not allow such very long idle connections. That looks like it went
wrong.

Hm, that didn't really answer the questions, though, did it,
other than perhaps implying that there's a bug?

Is there a different timeout used between the connection is
established and the first query is received than after the first
query is received? I'm going to assume the intention is "no".

The experience I had with a mis-configured unbound for
dns-over-tls (all 10 slots used up pretty quickly, and connection
requests piling up, taking ages to respond if at all), and also
the other message at

  https://nlnetlabs.nl/pipermail/unbound-users/2019-August/011748.html

seems to indicate that the tcp-idle-timeout feature doesn't
actually work the way one would think it is supposed to work.
This makes me wonder if this feature has actually been tested and
verified to work as intended?

Regards,

- Håvard

Hi,

following up on my own message:

Is there a different timeout used between the connection is
established and the first query is received than after the first
query is received? I'm going to assume the intention is "no".

The experience I had with a mis-configured unbound for
dns-over-tls (all 10 slots used up pretty quickly, and connection
requests piling up, taking ages to respond if at all), and also
the other message at

  https://nlnetlabs.nl/pipermail/unbound-users/2019-August/011748.html

seems to indicate that the tcp-idle-timeout feature doesn't
actually work the way one would think it is supposed to work.
This makes me wonder if this feature has actually been tested and
verified to work as intended?

Hmm, Patrik pointed to comm_point_start_listening(), and it sets up a
time-out on the event if passed a suitable value in the msec argument.
However, reading my event(3) man page reveals that this function creates
a non-persistent event, and the timeout therefore gets cancelled once
the first data is handled by the event. Is it so that the functions
tcp_callback_writer() and tcp_callback_reader() will be invoked, and
therefore re-activate the event + timeout when either reading or writing
of the DNS message is complete?

However, It looks like tcp_callback_reader() will call
comm_point_start_listening(), but only if the callback function returns
non-0, and I don't know what the possible functions invoked via
(*c->callback) might be or inspect what they may return and under what
conditions -- reading the code beyond that indirection is a little hard.
And perhaps at that point in time, the timeout value should be
re-computed?

And ... isn't the invocation of comm_point_start_listening() in
tcp_callback_writer() in effect turning *off* timeouts with the last -1
argument, when according to the comment it goes back to listening(read)?
Is it as simple as changing the last -1 in that invocation to
c->tcp_timeout_msec?

It looks like the "event & UB_EV_TIMEOUT" part in
comm_point_tcp_handle_callback() is responsible for doing a
server-side-initiated close() based on a triggered time-out?
Has the associated log message ever been observed?

Regards,

- Håvard

My second question is what the expected behaviour of unbound is for
TCP connections that are idling. From unbound.conf(5) I see
"tcp-idle-timeout" defaults to 30000ms, so this tells me a TCP
connection being silent for 30 seconds will be dropped but maby this
only matters until we have seen an initial query and will then leave
the connection forever?

I tracked down the file descriptor for one of the TCP connections to
unbound, found it was created over 12 hours ago, and then filtered
for traffic for the host and port that was holding the connection
with tcpdump, and not a single packet appeared for the several
minutes I was running it.

As you may have seen, I recently had an "encounter" with unbound
related to handling of TCP sessions, and it looks like there was a
bug that caused unbound with TCP client sessions which had seen an
initial query + response not to re-arm the TCP idle timeout for that
session.

The patch in the pull request at

  https://github.com/NLnetLabs/unbound/pull/122

should be a fix for this issue if I read the code correctly. If you
are able to do some testing / validation of this, that would be
greatly appreciated.

Best regards,

- Håvard

That is an interesting find, well done and thanks for sharing!

Looking at the patch I wanted to read about comm_point_start_listening()
and found some documentation at
https://nlnetlabs.nl/documentation/unbound/doxygen/netevent_8h.html
which states:

msec    timeout in milliseconds, or -1 for no (change to the) timeout.
So seconds*1000.

The "no change to the timeout" makes me curious, as I read that to mean
"calling the function with -1 retains the previously set value", did it
not work like that in this instance?

That is an interesting find, well done and thanks for sharing!

Looking at the patch I wanted to read about comm_point_start_listening()
and found some documentation at
https://nlnetlabs.nl/documentation/unbound/doxygen/netevent_8h.html
which states:

msec    timeout in milliseconds, or -1 for no (change to the) timeout.
So seconds*1000.

Hum, you're better than me to search the documentation; I didn't
even think to look there...

And on second reading, it looks like "-1 for no change to the timeout"
is correct; assuming libevent doesn't itself mess with the bits in the
"event" structure it gets passed.

So while the new code isn't wrong per se, it may not be fixing what I
thought it would. :frowning:

Regards,

- Håvard