Patch implementing round robin rrsets

Thijs_Kinkhorst · March 7, 2012, 11:48am

Hi all,

We've created a patch to have unbound randomize the order of the returned
records if one has an rrset with multiple records of the same type
(well-known as a round robin configuration and deployed widely).

We're running this in production on our site for a few weeks now. Our site
has thousands of synchronous users with a host of different platforms. To
date, nothing has broken. We use nsd as our authoritative server and we've
enabled DNSSEC.

The reason for implementing this is that we host some services on two
IP's, and since we've switched from bind as authoritative and recursive
server to nsd and unbound a few years ago, one of the servers providing the
service gets nearly all traffic while the second one is idling. This is
because neither nsd nor unbound randomize the order of rrset contents.
Obviously, I've first read some of the previous discussions on this list
about randomizing the round robin return sets. They were a few years ago so
the views may have changed, but I'd like to add my take to the
then-presented arguments and why I think this patch improves the situation.

- Selecting a random IP should be done on the client systems.

Our users use a wide range of devices, including various Windows versions,
Mac OS X and Linux. None of these do this by default, 'not even' my own
Linux workstation. As people bring their own device, it's not feasible to
change the configuration of all devices for this, nor to expect us to
convince all OS vendors to change this. This behaviour by clients is at
least for the foreseeable future, a given.

- Unbound tries its best to keep the rrset ordered.

A set by definition doesn't have an ordering, so I'm not sure what value
there is in preserve the order it had when it arrives. What does this
bring?

- It's not fast to reorder.

With our thousands of clients we've not experienced any measurable effect
in a speed sense of this patch.

- Randomizing the rrsets is feature creep.

This is of course subjective, but I dare to disagree: with 10 insertions
and 8 deletions the patch is small by any standard.

The patch can be found here:
http://non-gnu.uvt.nl/debian/squeeze/unbound/40_rrdns
That folder also contains patched Debian packages that we currently use in
production.

I'm looking forward to your response.

Wouter · March 7, 2012, 1:04pm

Hi Thijs,

Nice patch.

I would like to accept it, because it is short and helpful and
lightweight. There are the following suggestions below.

The line of interest is this one:
- - rr = random();

Now it seems that random() locks a mutex on Linux and may produce
improper results on FreeBSD in threads (but still reasonably random).
So, I want to lock it less. Also suggest to move the line one
further (behind the variable declaration).
+ rr = (data->count==1)?0:random();

And then hope that on Linux, performance does not degrade
multithreaded (it'll be copying the RR data while other threads lock
this lock).

In general a threadsafe solution with threadlocal storage requires a
lot more work. An alternative is to be blatantly thread-unsafe (its
non-security related pretty-random numbers anyway), and make a
  static int state = 1;
and use a simple congruent random function, the rand function:
  state = state * 1103515245 + 12345;
  rr = (unsigned)(state/65536) % 32768;
But then I worry about cache-line contention on the int state.

Depending on how this works out, this could end up in the mainline
source, and the code needs to have the BSD license for that. Or we
can put the patch in the contrib directory of the source tarball
(license of your choice).

Best regards,
Wouter

Tony_Finch · March 7, 2012, 2:55pm

Just a brief note in support of your patch:

RFC 3484 section 6 rule 10 says client systems should use IP addresses in
the order given by the DNS, in the absence of topology information that
allows you to distinguish them. So you can't expect them to spread the
load across multiple IP addresses without help from the DNS.

Tony.

Olaf_Kolkman · March 7, 2012, 3:29pm

But you could reasonably ask whether those load balancing properties should be under the control of the authoritative or the recursive name server.

--Olaf

Tony_Finch · March 7, 2012, 3:49pm

Indeed

The problem with saying it must only be the authority's job is that in
situations like ours (which being a university is I guess not that
different from Thijs's) the user population do not use very many different
resolvers, so the authorities do not get the opportunity to serve up lots
of different RR set orderings, unless you go for ridiculously short TTLs.

Tony.

Olaf_Kolkman · March 7, 2012, 4:59pm

So, you want to do traffic management to infrastructure within your own administrative domain with the resolver. If that is indeed the use case it is an argument for making an option like this configurable on a per domain basis, when you consciously want to overwrite the intention of the domain holder.

--Olaf (just thinking out loud)

David_Miller · March 7, 2012, 5:01pm

Since the order of RRs in a set is not significant, then load balancing
through the use of
multiple RRs at a name could only be an attempt to evenly distribute between
addresses/hosts. Randomization of records within a set at the resolver
level will only
improve the even distribution between hosts.

[RFC 1034]

3.6. Resource Records

A domain name identifies a node. Each node has a set of resource
information, which may be empty. The set of resource information
associated with a particular name is composed of separate resource
records (RRs). The order of RRs in a set is not significant, and need
not be preserved by name servers, resolvers, or other parts of the DNS.

Those who desire finer control over weight for hosts in load balancing
need to implement
this through SRV records - which provide this functionality... and then
join the crowd
complaining to those who make clients that don't support SRV records.

-DMM

Thijs_Kinkhorst · March 7, 2012, 5:16pm

Hi Wouter,

Nice patch.

I would like to accept it, because it is short and helpful and
lightweight. There are the following suggestions below.

Thanks!

The line of interest is this one:
- rr = random();

Now it seems that random() locks a mutex on Linux and may produce
improper results on FreeBSD in threads (but still reasonably random).
So, I want to lock it less. Also suggest to move the line one
further (behind the variable declaration).
+ rr = (data->count==1)?0:random();

This makes sense, as probably a large number of rrsets will have a count
of one.

And then hope that on Linux, performance does not degrade
multithreaded (it'll be copying the RR data while other threads lock
this lock).

In general a threadsafe solution with threadlocal storage requires a
lot more work. An alternative is to be blatantly thread-unsafe (its
non-security related pretty-random numbers anyway), and make a
  static int state = 1;
and use a simple congruent random function, the rand function:
  state = state * 1103515245 + 12345;
  rr = (unsigned)(state/65536) % 32768;
But then I worry about cache-line contention on the int state.

Yes. Of course the number doesn't need to be really random indeed. I can't
judge on the significancy of the performance impacts of the various
approaches: we don't have an extremely large setup, but we do serve
thousands of clients at any time, and we've not seen any trouble.

If it were up to me I'd say both approaches (added count check vs. int
state) would be perfectly fine for us, but I wouldn't go any further than
that because I doubt the benefits would outweigh the added complexity in
that case. So if you want to commit either version I'd already be very
happy.

Depending on how this works out, this could end up in the mainline
source, and the code needs to have the BSD license for that. Or we
can put the patch in the contrib directory of the source tarball
(license of your choice).

The BSD license is just fine. The formal copyright holder of the work is
"Tilburg University", if that would be relevant.

Thijs_Kinkhorst · March 7, 2012, 5:26pm

But you could reasonably ask whether those load balancing properties
should be under the control of the authoritative or the recursive name
server.

Indeed

The problem with saying it must only be the authority's job is that in
situations like ours (which being a university is I guess not that
different from Thijs's) the user population do not use very many
different
resolvers, so the authorities do not get the opportunity to serve up

lots

of different RR set orderings, unless you go for ridiculously short

TTLs.

So, you want to do traffic management to infrastructure within your own
administrative domain with the resolver. If that is indeed the use case

it

is an argument for making an option like this configurable on a per

domain

basis, when you consciously want to overwrite the intention of the

domain

holder.

The order of records in an rrset is not defined: it's a set. If you return
them in a different order at the resolver, you're not overwriting any
'intention' of the domain holder, as there's no expressed intention. Set
(a,b,c) is the same set as (a,c,b), afterall.

I would not be in favour of such a configuration option, as that adds
complexity to the proposed patch which in my opinion is not needed: if auth
returns (a,b,c) and resolver returns (a,c,b), both have returned the exact
same set. What would you want to gain with such an option?

Felix_Schueren · March 7, 2012, 5:27pm

Hi,

actually, I feel that DNS-based "load balancing" or "load distribution"
when relying on RRset randomization is the poorest choice you can make -
you have no control at all over distribution of the traffic, which means
that a single system must be able to take the full load anyway, plus you
end up with downtime of at least RRset TTL, or minimum cache TTL in some
cases. Even with randomization in Unbound these problems do not get
fixed - in my opinion you're fixing the wrong side of the problem. DNS
by itself was never meant for loadbalancing, and trying to retrofit it
in, you'll always come to a point where you realize that you should have
used loadbalancers or some other redundancy/failover mechanism from the
beginning.

Also, thousands of clients is not really a figure that shows how big or
small the impact of the additional CPU required might be on a properly
loaded setup - how many queries per second are you doing with your
setup? We're currently sitting at approx 30,000 queries per second.

Wouter, if you bring this randomization into mainline, please make it
configurable, as in my opinion this is only useful for very few
specialized environments, most notably probably resource-constrained
institutions that are forced to use dns-based loadbalancing.

Kind regards,

Felix

Olaf_Kolkman · March 7, 2012, 5:33pm

Off course, forgot about that.

Thijs_Kinkhorst · March 7, 2012, 6:28pm

actually, I feel that DNS-based "load balancing" or "load distribution"
when relying on RRset randomization is the poorest choice you can make -
you have no control at all over distribution of the traffic, which means
that a single system must be able to take the full load anyway, plus you
end up with downtime of at least RRset TTL, or minimum cache TTL in some
cases.

Obviously anyone using round robin DNS is aware of its limitations.
However, it's perfectly possible to use it in combination with other tools
that don't present most of the problems you sketch and is hence not a
"poor" choice. One example is that we're using an active-active setup
combined with heartbeat. Each node has a heartbeat managed IP-address which
is migrated when there's downtime for a single node. However, we do need a
way to express in DNS that you can reach the service on either of its IP's.

Of course we could invest tens of thousands of euros in dedicated load
balancing equipment, but not every service requires that and we can get by
with this setup just fine.

Even with randomization in Unbound these problems do not get
fixed - in my opinion you're fixing the wrong side of the problem. DNS
by itself was never meant for loadbalancing, and trying to retrofit it
in, you'll always come to a point where you realize that you should have
used loadbalancers or some other redundancy/failover mechanism from the
beginning.

Sure, it's just one tool, not something that will solve all problems for
anyone. I don't agree that there would be no need for simple tools because
'there will always come a point' where you need the advanced tool.

The use of the word "retrofit" seems to suggest that this patch is trying
to do something new here. As we all know this technique has existed since
the beginnings of DNS, and the de facto reference implementation has been
doing it for decades already.

Also, thousands of clients is not really a figure that shows how big or
small the impact of the additional CPU required might be on a properly
loaded setup - how many queries per second are you doing with your
setup? We're currently sitting at approx 30,000 queries per second.

I'd love to see some comparative stress test results.

Wouter, if you bring this randomization into mainline, please make it
configurable, as in my opinion this is only useful for very few
specialized environments, most notably probably resource-constrained
institutions that are forced to use dns-based loadbalancing.

I have a hard time to believe that DNS round robin is something "rare" or
"specialised". It's not something new, it's something many people have been
doing for many years. And even if it would be only because of resource
constraints: it's not as if the number of resource-constrained
organizations in the world is insignificant by any measure.

Tony_Finch · March 7, 2012, 7:04pm

actually, I feel that DNS-based "load balancing" or "load distribution"
when relying on RRset randomization is the poorest choice you can make -
you have no control at all over distribution of the traffic, which means
that a single system must be able to take the full load anyway,

This is not actually a problem in the real world.

plus you end up with downtime of at least RRset TTL,

Not if you have a non-DNS recovery mechanism.

Tony.

Mans_Nilsson · March 7, 2012, 9:52pm

Actually, in our setup, with BIND as resolver we saw close to uniform
load on the two service servers that shared the round-robin record. With
unbound, (still using BIND as name server in the shape of InfoBlox
devices) we had a serious tendency towards loading the machine with the
lowest IP address much harder; more like 80/20 than 50/50. Since this
was the older machine, and we'd relied on round-robin for load sharing,
things went pear shaped PDQ. We threw hardware on the problem, so are good
for now, but round-robin really would set things straight.

I'm quite happy that the userbase made the effort, and I'm equally delighted
that it seems to be headed into a future unbound release. Open Source win.

Wouter, any guesstimates on release schedule?

cloos · March 7, 2012, 11:19pm

The line of interest is this one:
- rr = random();

Cf random_r(3).

The (linux) page says that it is in the namespace
iff _SVID_SOURCE || _BSD_SOURCE, so it should be OK
essentially everywhere.

-JimC

Robert_Edmonds · March 8, 2012, 12:35am

James Cloos wrote:

> The line of interest is this one:
> - rr = random();

Cf random_r(3).

The (linux) page says that it is in the namespace
iff _SVID_SOURCE || _BSD_SOURCE, so it should be OK
essentially everywhere.

no, those are the feature test macro requirements. read farther down:

CONFORMING TO
These functions are nonstandard glibc extensions.

random_r() isn't in POSIX, and it isn't available on, e.g., freebsd.

cloos · March 8, 2012, 8:26am

Cf random_r(3).
...

... read farther down:

CONFORMING TO
These functions are nonstandard glibc extensions.

[SIGH] Indeed I stopped reading/skimming at the end of the first "page"
and missed the conforming to section.

-JimC