NSD currently only processes one UDP packet per socket per select(). Since
select() is kind of expensive, under load this means it burns a lot of CPU
unnecessarily.
There's a simple trick to avoid this. Make the UDP socket non-blocking, and
loop on recvfrom() until it returns -1, ignoring any EAGAIN errors. Under
light load, this results in an extra recvfrom() every packet. But under
heavy load, this avoids select() until the input buffer is drained.
Attached is an example patch against NSD 2.3.4 that implements this. According to the queryperf tool that comes with BIND, on a simple query
against localhost on an old Linux box, this takes NSD's peak throughput from
39kpps to 48kpps, a 23% improvement. These are obviously ideal conditions,
but please feel free to test for yourself.
If the patch gets munged in transit, it is also available from:
https://www.die.net/tmp/1c50b61e244661c1/nsd-2.3.4-fewerselects.patch
-- Aaron
[On 12 May, @12:41, Aaron Hopkins wrote in "Reducing select() usage under ..."]
against localhost on an old Linux box, this takes NSD's peak throughput from
39kpps to 48kpps, a 23% improvement. These are obviously ideal conditions,
but please feel free to test for yourself.
whoa!
That's very nice.
We'll look at this asap, but next week we're all at SANE 2006. So at
the earliest this will be in the week after SANE.
Aaron Hopkins wrote:
NSD currently only processes one UDP packet per socket per select(). Since
select() is kind of expensive, under load this means it burns a lot of CPU
unnecessarily.
There's a simple trick to avoid this. Make the UDP socket non-blocking, and
loop on recvfrom() until it returns -1, ignoring any EAGAIN errors. Under
light load, this results in an extra recvfrom() every packet. But under
heavy load, this avoids select() until the input buffer is drained.
But you will have to be careful not to starve other sockets that may have incoming requests waiting. Since NSD will usually run with multiple sockets (UDP, TCP, IPv4, IPv6, multiple interfaces) this can become quite hard and/or expensive. That's why NSD currently uses a select and processes all readable sockets (not just the first!) every iteration.
Regards,
Erik
It is hard and expensive if you want to ensure perfect fairness and
interleave responses from every socket. But I think there a compromise
available between perfect fairness and only answering requests from one
socket when it is flooded.
Changing that while(1) I added to something that would only loop up to a
fixed number of times (e.g. 100) would be trivial. You'd still amortize the
cost of the select() over many UDP packets, without being able to starve
other sockets for more than a few milliseconds. You'd concentrate on work
from one socket, then switch to the next one and do everything pending up to
the same limit. And the performance gains would be approximately the same.
As for TCP fairness in this scheme, you'd probably also want to loop
accepting TCP connections up until current_tcp_count >= maximum_tcp_count. But since each TCP connection gets its own socket, each one will get some
attention every select(), and select()s will still be happening hundreds of
times per second.
-- Aaron
Aaron Hopkins wrote:
It is hard and expensive if you want to ensure perfect fairness and
interleave responses from every socket. But I think there a compromise
available between perfect fairness and only answering requests from one
socket when it is flooded.
Changing that while(1) I added to something that would only loop up to a
fixed number of times (e.g. 100) would be trivial. You'd still amortize the
cost of the select() over many UDP packets, without being able to starve
other sockets for more than a few milliseconds. You'd concentrate on work
from one socket, then switch to the next one and do everything pending up to
the same limit. And the performance gains would be approximately the same.
Yes, that could certainly work. And like you measured, saving on select() makes a huge difference for NSD since it spends a lot of its time in system calls 
Regards,
Erik
Is a flood even realistic, given expected CPU speed and pipe widths?
It seems that the bottleneck would get hit on bandwidth before starvation
effects happen. Could a flooding attack really fill the packet queue faster
than than the code can drain it?
And given this attack, processing a hundred packets on the attacked port
before getting to an unattacked port -- there's a tradeoff there to.
Mindlessly defending the status quo even as it shifts,
David Nicol
David Nicol writes:
Is a flood even realistic, given expected CPU speed and pipe widths? It seems that the bottleneck would get hit on bandwidth before starvation effects happen. Could a flooding attack really fill the packet queue faster than than the code can drain it?
Someone mentioned 49kpps a few days ago. That's maybe 50Mbps, right? It's trivial to find a cheapish server with 1-2 gigabit NICs, and almost as easy to find a colo with enough bandwidth that a determined attacker can shove 100-200Mbps at you.
Arnt
[On 12 May, @12:41, Aaron Hopkins wrote in "Reducing select() usage under ..."]
NSD currently only processes one UDP packet per socket per select(). Since
select() is kind of expensive, under load this means it burns a lot of CPU
unnecessarily.
Hello,
Thanks for your patches. The speed improvements you see are impressive.
We are however rather reluctant to apply the select()-patch to NSD at
this moment:
We've tested the speed improvements in our DISTEL testlab and
we did see some gain, but not the amount of improvement you noticed.
However, it could be this is because we are filling the 100Mb
interfaces. We will therefore upgrade the testlabs hardware to a 1Gb network.
Once we have finalized the measurements we'll produce the result on this list.
If the differences are not very significant we are hesitant to code
around the use of select() because of principles of code simplicity
and portability.
We've tested the speed improvements in our DISTEL testlab and
we did see some gain, but not the amount of improvement you noticed.
Which OS were you testing on, by chance? The numbers I mentioned were for
Linux 2.4, but I've since tested on Linux 2.6 and found the patch offering
less of an improvement. The patch will matter more on OSes where select()
is more expensive.
However, it could be this is because we are filling the 100Mb interfaces.
We will therefore upgrade the testlabs hardware to a 1Gb network. Once we
have finalized the measurements we'll produce the result on this list.
If your lab has several common Unix-ish OSes available, perhaps with
supported IP checksum offload network cards, be sure to test on several.
I'm very pleased with NSD's ability to fill 100 megabit interfaces. At
least in my application, though, I'm interested in creating nameserver
clusters that can handle several gigs of traffic. I realize this isn't
common.
If the differences are not very significant we are hesitant to code
around the use of select() because of principles of code simplicity
and portability.
Of course.
-- Aaron
[On 24 May, @20:12, Aaron Hopkins wrote in "Re: Reducing select() usage un ..."]
>We've tested the speed improvements in our DISTEL testlab and
>we did see some gain, but not the amount of improvement you noticed.
Which OS were you testing on, by chance? The numbers I mentioned were for
Linux 2.4, but I've since tested on Linux 2.6 and found the patch offering
less of an improvement. The patch will matter more on OSes where select()
is more expensive.
This is all tested with FreeBSD, but we can just swap the server
system for any OS we like,
If your lab has several common Unix-ish OSes available, perhaps with
supported IP checksum offload network cards, be sure to test on several.
I'm very pleased with NSD's ability to fill 100 megabit interfaces. At
least in my application, though, I'm interested in creating nameserver
clusters that can handle several gigs of traffic. I realize this isn't
common.
Ack. I'm personally very interested to see what happens on a gigabit
network,
I had a chance to test this patch in different environment. I had three
Linux 2.6 boxes with one 3.4ghz hyperthreaded P4 each on a gigabit network. Two machines acted as clients running BIND's queryperf tool, one was running
NSD 2.3.4.
With stock NSD 2.3.4 with no -N specified, I got 43000 qps total. With -N 2
specified, I got 50000 qps total.
Adding my select()-reduction patch to stock NSD 2.3.4 with no -N specified,
I got 49000 qps total. With -N 2 specified, I got 55000 qps total.
In this environment, it seems that reducing select()s offers a 10-14%
performance improvement.
-- Aaron
a message of 26 lines which said:
With stock NSD 2.3.4 with no -N specified, I got 43000 qps total. With -N 2
specified, I got 50000 qps total.
And 15 days for this message to get out of NLnetlabs. That's
impressive 
Stephane Bortzmeyer wrote:
And 15 days for this message to get out of NLnetlabs. That's
impressive 
As you might have noticed, our mailing list manager was lagging a bit.
Sorry for the inconvenience, it has been fixed now.
Jelte