Timeout for TCP queries to NSD

anandb · May 14, 2020, 10:32am

Hi NSD developers,

I'm using the "dnspython" module to AXFR some large zones from one of our NSD 4.2.4 servers. Around a quarter of the time, the AXFR fails, and python throws an EOFError exception. This usually means that the server closed the connection. The same AXFRs, when done with "dig", always succeed.

I think that since "dnspython" is quite slow, there must be some kind of timeout being triggered in NSD, and it closes the connection. However, the only mention of any TCP-related timeout in nsd.conf is "tcp-timeout". The explanation of that option isn't very clear to me. It says:

"Overrides the default TCP timeout. This also affects zone transfers over TCP."

Is this for incoming queries to NSD, or outgoing TCP queries made by NSD? Also, what is the default TCP timeout that this refers to?

And if this is not related to TCP queries to an NSD server, then where and what timeouts does NSD apply when answering TCP queries?

Just for comparison, the same AXFRs, made using "dnspython" to a BIND server, all succeed. BIND's default TCP timeout paramters are all set to 30s.

Regards,
Anand

Wouter · May 14, 2020, 11:29am

Hi Anand,

Hi NSD developers,

I'm using the "dnspython" module to AXFR some large zones from one of
our NSD 4.2.4 servers. Around a quarter of the time, the AXFR fails, and
python throws an EOFError exception. This usually means that the server
closed the connection. The same AXFRs, when done with "dig", always
succeed.

I think that since "dnspython" is quite slow, there must be some kind of
timeout being triggered in NSD, and it closes the connection. However,
the only mention of any TCP-related timeout in nsd.conf is
"tcp-timeout". The explanation of that option isn't very clear to me. It
says:

"Overrides the default TCP timeout. This also affects zone transfers
over TCP."

Is this for incoming queries to NSD, or outgoing TCP queries made by
NSD? Also, what is the default TCP timeout that this refers to?

Yes this applies to incoming queries and to outgoing queries. 120
seconds by default.

A much smaller value, of 200 msec, is used when the server is nearly
full on capacity, for incoming connections that are over the limit.
Also when the server has updated the existing connections get a smaller
100 msec timeout to wait for them to complete their tcp query to NSD.

That last feature since 4.2.1. The tcp full shorter timeout is since
4.1.11.

Best regards, Wouter

anandb · May 14, 2020, 12:43pm

Hi Wouter,

Yes this applies to incoming queries and to outgoing queries. 120
seconds by default.

Thanks for the clarification. I think the default of 120s should be documented in the man page.

I'm still not clear on what the timeout applies to though. Is it to the time between individual DNS messages in a TCP connection? Or does it apply to any period of inactivity in the connection?

A much smaller value, of 200 msec, is used when the server is nearly
full on capacity, for incoming connections that are over the limit.
Also when the server has updated the existing connections get a smaller
100 msec timeout to wait for them to complete their tcp query to NSD.

That last feature since 4.2.1. The tcp full shorter timeout is since
4.1.11.

Now that you've explained it here, I recall that there was something about this in the release notes. However, the values of 200ms isn't documented. The release notes have:

"When tcp is more than half full, use short timeout for tcp session." So I'm guessing that "short timeout" here is 200ms. Also, it's not clear whether the timeout is dynamic. What I mean is: is it applied to all sessions (existing and new), or only to new ones. When the number of tcp connections drops to less than half, is the timeout reset to 120s? And is it reset for all sessions, or just new ones?

Dropping from the default 120s, to a mere 200ms when the number of TCP connections goes up, is quite dramatic. And I happen to think that 200ms is too low. A client that's getting an AXFR from such an NSD server is quite likely to suffer disconnects. In fact, I have been observing exactly this behaviour on the servers we run. We have a use case where a user is doing AXFR of some largish zones, and when the client is a bit slow, NSD drops the connection. This causes the client to retry. This, IMHO, is rather wasteful.

The other feature of shortening the timeout to 100ms is also not so obvious. The release notes have:

"Fix #14, tcp connections have 1/10 to be active and have to work
every second, and then they get time to complete during a reload,
this is a process that lingers with the old version during a version update."

The 1/10 there is not very readable. I think that 100ms would be much clearer. And I also don't understad what you mean by "and have to work every second". Could you please explain that?

In my opinion, such details should not be buried in the release notes document. The release notes are useful when comparing one version to another. All these features of how the server dynamically adjusts its behaviour should be in the operations manual or at least the nsd.conf man page.

Imagine a new user of NSD, who is trying to configure and tune the server, and sets "tcp-timeout" to some value, and still observes different behaviour when running the server. This leads to confusion. And it's not reasonable to expect the user to read the entire set of release notes trying to find such undocumented features.

Regards,
Anand Buddhdev
RIPE NCC

Wouter · May 14, 2020, 1:33pm

Hi Anand,

Hi Wouter,

Yes this applies to incoming queries and to outgoing queries. 120
seconds by default.

Thanks for the clarification. I think the default of 120s should be
documented in the man page.

Yep, done that! Could also adjust the default itself, but I documented
it now in the man page.

I'm still not clear on what the timeout applies to though. Is it to the
time between individual DNS messages in a TCP connection? Or does it
apply to any period of inactivity in the connection?

Between DNS messages. But for AXFR the timeout is reset for every
individual DNS response message (fragment) of the AXFR response stream.

A much smaller value, of 200 msec, is used when the server is nearly
full on capacity, for incoming connections that are over the limit.
Also when the server has updated the existing connections get a smaller
100 msec timeout to wait for them to complete their tcp query to NSD.

That last feature since 4.2.1. The tcp full shorter timeout is since
4.1.11.

Now that you've explained it here, I recall that there was something
about this in the release notes. However, the values of 200ms isn't
documented. The release notes have:

"When tcp is more than half full, use short timeout for tcp session." So
I'm guessing that "short timeout" here is 200ms. Also, it's not clear
whether the timeout is dynamic. What I mean is: is it applied to all
sessions (existing and new), or only to new ones. When the number of tcp

Only to the new ones, that are above the limit.

connections drops to less than half, is the timeout reset to 120s? And
is it reset for all sessions, or just new ones?

It is then applied to new connections.

Dropping from the default 120s, to a mere 200ms when the number of TCP
connections goes up, is quite dramatic. And I happen to think that 200ms
is too low. A client that's getting an AXFR from such an NSD server is

Not really for a busy server that is responding to TCP queries. But it
may be for you, but only then in this slow AXFR situation. But I doubt
you are in that sort of space limitation, in which case it is exactly
meant to drop slow, sluggish responders to make space for other, active,
users.

quite likely to suffer disconnects. In fact, I have been observing
exactly this behaviour on the servers we run. We have a use case where a
user is doing AXFR of some largish zones, and when the client is a bit
slow, NSD drops the connection. This causes the client to retry. This,
IMHO, is rather wasteful.

You should increase the amount of tcp buffers available, with tcp-count:
2000 in nsd.conf. That should give a better amount of TCP queries.

Could also make that 200 msec configurable, but I doubt it would be good
to increase that, too much, the main point of it is to force TCP
sessions to close if they are lagging the server. For that it has to be
short. But other values than 200 msec could also exist, though. I
doubt that that would help your AXFR server because I guess that might
be lagging by a whole lot, even if it is hit by this timer, I think it
is likely the 100 msec timer after a reload, which is much more
arbitrary in choice.

The other feature of shortening the timeout to 100ms is also not so
obvious. The release notes have:

"Fix #14, tcp connections have 1/10 to be active and have to work
every second, and then they get time to complete during a reload,
this is a process that lingers with the old version during a version
update."

The 1/10 there is not very readable. I think that 100ms would be much
clearer. And I also don't understad what you mean by "and have to work
every second". Could you please explain that?

That is a misnomer, I meant work frequently and consistently. The value
is up for grabs as well, if you think it is too low, I could increase it
to, eg. 30 seconds, like BIND has for other TCP timers. But it would
start to keep more old TCP connections around after reloads, so I opted
for a more defensive low value that keeps the server from having
resources for old TCP connections, but also allows (fast-responding-)
servers to get service completion. What sort of timeout are you
thinking would help that sluggish python process?

In my opinion, such details should not be buried in the release notes
document. The release notes are useful when comparing one version to
another. All these features of how the server dynamically adjusts its
behaviour should be in the operations manual or at least the nsd.conf
man page.

Imagine a new user of NSD, who is trying to configure and tune the
server, and sets "tcp-timeout" to some value, and still observes
different behaviour when running the server. This leads to confusion.
And it's not reasonable to expect the user to read the entire set of
release notes trying to find such undocumented features.

That is true. It is not configurable today, though. Not sure if it
should be, perhaps it can have a (new and different) sensible default.

Best regards, Wouter

anandb · May 14, 2020, 2:49pm

Hey Wouter,

Thanks for the clarification. I think the default of 120s should be
documented in the man page.

Yep, done that! Could also adjust the default itself, but I documented
it now in the man page.

Thanks!

I'm still not clear on what the timeout applies to though. Is it to the
time between individual DNS messages in a TCP connection? Or does it
apply to any period of inactivity in the connection?

Between DNS messages. But for AXFR the timeout is reset for every
individual DNS response message (fragment) of the AXFR response stream.

Okay, got it. So if a TCP client sends 1 byte per TCP packet, every 3 seconds, and delivers a query of 30 bytes, it will take 90 seconds for NSD to receive one query to respond to? And then this same client can keep the connection open, wait for 115s, and then resume sending a query, 1 byte at a time, to keep this TCP connection occupied for ages?

This is the classic Slow Loris attack, isn't it?

Only to the new ones, that are above the limit.

Okay, so the default "tcp-count" is 100. Does this mean that the first 50 TCP clients get a generous 120s timeout, and continue to enjoy that, as long as they keep the TCP connection open and keep sending some data as described above? Aren't these the "sluggish clients" that should be penalised somehow and be ejected?

And then the next 50 clients, because they didn't get in there first, are subject to a harsher 200ms timeout?

This seems like whoever gets in first gets the comfortable seats, and the late-comers have to sit on hard chairs.

Dropping from the default 120s, to a mere 200ms when the number of TCP
connections goes up, is quite dramatic. And I happen to think that 200ms
is too low. A client that's getting an AXFR from such an NSD server is

Not really for a busy server that is responding to TCP queries. But it
may be for you, but only then in this slow AXFR situation. But I doubt
you are in that sort of space limitation, in which case it is exactly
meant to drop slow, sluggish responders to make space for other, active,
users.

I don't understand this... if some slow, sluggish TCP clients get in first, and occupy the first half of available TCP slots, then how do they ever get ejected? And while they do this, newer legitimate TCP clients are subject to a low timeout.

You should increase the amount of tcp buffers available, with tcp-count:
2000 in nsd.conf. That should give a better amount of TCP queries.

Yes, sure, I can adjust the "tcp-count" setting. But I'm not so fond of this dynamic adjustment of the timeout to 200ms when half the TCP slots are filled. A lower "tcp-timeout" that applies equally to all clients, is fairer.

Could also make that 200 msec configurable, but I doubt it would be good
to increase that, too much, the main point of it is to force TCP
sessions to close if they are lagging the server. For that it has to be

I agree that slow TCP clients should be dropped quickly. This dynamic lowering of the timeout should be removed. At this time, it's neither configurable nor documented, so removing it would not affect anyone's configuration.

If you insist on keeping this dynamic adjustment, then the option should be given some kind of descriptive name, such as "tcp-timeout-when-busy", and the option should explain exactly when it triggers and when it gets reset.

short. But other values than 200 msec could also exist, though. I
doubt that that would help your AXFR server because I guess that might
be lagging by a whole lot, even if it is hit by this timer, I think it
is likely the 100 msec timer after a reload, which is much more
arbitrary in choice.

That's also possible. Our NSD servers have many slave zones, and they see frequent updates, so the server process reloads zones frequently.

I read issue #14, and I understand the reasoning for your fix to have a lingering process that services an ongoing AXFR. But why not just use the regular timeout for it?

That is a misnomer, I meant work frequently and consistently. The value
is up for grabs as well, if you think it is too low, I could increase it
to, eg. 30 seconds, like BIND has for other TCP timers. But it would
start to keep more old TCP connections around after reloads, so I opted
for a more defensive low value that keeps the server from having
resources for old TCP connections, but also allows (fast-responding-)
servers to get service completion. What sort of timeout are you
thinking would help that sluggish python process?

Our Knot DNS servers were also using a rather harsh 500ms timeout on TCP I/O, and this was affecting the slower AXFR client. I changed that to 5s, and the AXFR client is now happy. I don't know if there is a magical value that is suitable for all, but again, if this parameter were at least configurable, then it would serve 2 purposes:

1. inform the operator that NSD is trying to do clever things; and

2. allow the operator to keep NSD from being too clever when this cleverness causes operational problems

That is true. It is not configurable today, though. Not sure if it
should be, perhaps it can have a (new and different) sensible default.

See my explanations above. Hope they help you in thinking about a way forward to improve the TCP tuning parameters.

Regards,
Anand