Unbound closes receive socket => udp probes

Ilya_Bakulin · July 3, 2013, 1:07pm

Hi Unbound developers,

Unbound uses a smart algorithm to adjust timeouts for outgoing queries,
reducing timeout in case the server replies quickly. Which is good.

In real world the upstream server may become temporarily overloaded and may not answer as quickly as before.
In this case Unbound will close the receive socket before the actual reply reaches it.
When there is a fair amount of traffic between the Unbound and upstream server, quite a few replies will hit
the closed socket.
Normally the server will just send "ICMP unreachable" message (taking into account that there may be MANY such
messages -- for each failed query). Our product reports them as "UDP probe attempt", which is logical, because
many UDP packets hitting closed ports may indeed signal about UDP probe attack.
There are several possible ways this problem can be solved.
The best one is, as I understand, to keep the socket opened for some time after the timeout has elapsed,
just to allow the system to receive replies, but don't process them.
Another one is to modify timeout adjusting algorithm to allow some fluctuations in reply time.

My test setup to reproduce this behavior involves using "delayer.c" program from the Unbund test suite.
I have modified it slightly to be able to adjust delay at runtime by sending SIGUSR1/SIGUSR2 to the delayer process.
(the patch is attached).
In my example, one SIGUSR1 increases the delay by 20 milliseconds.
The trick with signals sending was nessesary to avoid restart of delayer process and closing its socket.

Then I start the delayer and nsd on machine A and configure unbound on machine B to use A as a resolver for one particular zone.

Then I send many concurrent queries for nonexistent random names within the configured zone, and Unbound quickly learns that
there is a good and fast connection to the machine B, and decreases the timeout.
The script used to generate queries is attached.

At some point, while still making many queries, I send SIGUSR1 to delayer process and it increases the delay before transmitting
the query to nsd. This reflects the situation when the destination server starts having some load problems.

Unbound closes its receive sockets prematurely, and replies from B hit the OpenBSD IP stack, causing UDP probe alerts.

Here is a log that shows changes in Unbound infrastructure cache. 172.16.100.118 is the target server for the zone black.zone:

(while true ; do unbound-control-all lookup black.zone | grep 172.16.100.118 ; sleep 1; done)
172.16.100.118 not in infra cache.
172.16.100.118 rto 57 msec, ttl 899, ping 17 var 10 rtt 57, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 35 msec, ttl 898, ping 11 var 6 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 42 msec, ttl 897, ping 14 var 7 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 31 msec, ttl 896, ping 11 var 5 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 29 msec, ttl 895, ping 9 var 5 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 35 msec, ttl 894, ping 11 var 6 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 34 msec, ttl 893, ping 10 var 6 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 34 msec, ttl 892, ping 10 var 6 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 30 msec, ttl 891, ping 10 var 5 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
T1: SIGUSR1 sent to delayer, delay += 20ms ...
172.16.100.118 rto 136 msec, ttl 890, ping 36 var 25 rtt 136, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 70 msec, ttl 889, ping 50 var 5 rtt 70, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 73 msec, ttl 888, ping 53 var 5 rtt 73, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 77 msec, ttl 887, ping 53 var 6 rtt 77, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 66 msec, ttl 886, ping 54 var 3 rtt 66, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 79 msec, ttl 885, ping 55 var 6 rtt 79, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 110 msec, ttl 884, ping 22 var 22 rtt 110, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 95 msec, ttl 883, ping 27 var 17 rtt 95, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
T2: SIGUSR2 sent to delayer, delay -= 20ms ...
172.16.100.118 rto 42 msec, ttl 882, ping 18 var 6 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 64 msec, ttl 881, ping 24 var 10 rtt 64, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 49 msec, ttl 880, ping 9 var 10 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 49 msec, ttl 879, ping 9 var 10 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
172.16.100.118 rto 49 msec, ttl 877, ping 9 var 10 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.

At the moment T1 I see the following in the kernel log of OpenBSD machine where Unbound runs:
Jun 28 13:25:27 ggd114 /bsd: udp_probe: em0: ac106476[53] ac106372[59728] udplen 106: 36c08503 00010000 00010001 126e6f6e 65786973 74656e74 2d323237 38343405 626c6163 6b047a6f 6e650000 010001c0 1f000600 0100001c 20002406 67676431...
Jun 28 13:25:27 ggd114 /bsd: udp_probe: em0: ac106476[53] ac106372[57241] udplen 106: 91748503 00010000 00010001 126e6f6e 65786973 74656e74 2d343835 39323805 626c6163 6b047a6f 6e650000 010001c0 1f000600 0100001c 20002406 67676431...
Jun 28 13:25:27 ggd114 /bsd: udp_probe: em0: ac106476[53] ac106372[10757] udplen 106: 2e728503 00010000 00010001 126e6f6e 65786973 74656e74 2d363333 30333905 626c6163 6b047a6f 6e650000 010001c0 1f000600 0100001c 20002406 67676431...
... potentially followed by more messages

The problem may not be visible on other systems with not so paranoic IP stack, but nevertheless they may send many "ICMP unreachable" messages, flooding their link.

Please tell me if this problem has a chance to be fixed.

(attachments)

delayer.c.diff (2.08 KB)
test.pl (552 Bytes)

Phil_Mayers · July 3, 2013, 2:03pm

I also think it would be good to alleviate this issue. It's polite to the network and other hosts to properly receive reply packets to your own requests, even if you no longer need them.

Wouter · July 8, 2013, 2:25pm

Hi Ilya,

Please tell me if this problem has a chance to be fixed.

I also think it would be good to alleviate this issue. It's polite
to the network and other hosts to properly receive reply packets to
your own requests, even if you no longer need them.

The packets have timed out. We do not expect them any longer. A
retry is probably sent over another port number (randomised) and thus
uses a different socket.

I do not know how to do what you ask - keep the port open for a reply
that arrives later than expected, in a way that is good for
performance and on resources. The time limit is 2*sigma based on past
observations (a smoothed rtt). Performance will go down significantly
when more sockets are kept open. Also sockets are a limited resource,
and keeping them open means other requests cannot be dealt with.

So, although I understand this ICMP port closed is troublesome, I do
not know how to get rid of it. Is there something I can tell the
kernel that stops the ICMP port closed (for UDP)? Should unbound
listen to raw sockets and somehow remove the packet destined for an
old port (but what if someone runs 'dig' and it uses a random port
that unbound just previously used?).

Best regards,
Wouter

Phil_Mayers · July 8, 2013, 3:03pm

Fair enough; if it's not possible in any sensible way, then it's not possible.

FWIW I do acknowledge the issues you raise; it is entirely possible there is no nice solution, other than what's already being done.

AFAIK there's no "/dev/null" for UDP sockets, though maybe you could emulate one by setting the SO_RCVBUF as small as possible and dropping it from the select/poll/epoll loop, with a close queued for some future time. But that'll probably still consume kernel resources. Whether this is a significant concern, I couldn't say - presumably you only have to "slow close" sockets that didn't receive a reply, which should be in a minority.

It could be more complicated than it's worth.

Ilya_Bakulin1 · July 29, 2013, 1:44pm

Hi Wouter,

So, although I understand this ICMP port closed is troublesome, I do
not know how to get rid of it. Is there something I can tell the
kernel that stops the ICMP port closed (for UDP)? Should unbound
listen to raw sockets and somehow remove the packet destined for an
old port (but what if someone runs 'dig' and it uses a random port
that unbound just previously used?).

yes you're right, the "right" fix is really complicated; therefore it is
actually wrong.

We have another suggestion, that may help -- adding some constant value to
the calculated RTT. This will slow the rps rate, but at least eliminate
ICMP flood in cases when there are some fluctuations in the network that cause
answers to arrive a bit slower.
I have tried to find a right place in the code to add this, but
seems I haven't succeed. Could you please help me?
I understand that this is also a kind of "hack" and it might not
be committed, but we really don't have another choice -- there is
no way to tell if there was a UDP probe attack or late answer from DNS...

Thank you for your time.

Wouter · July 29, 2013, 2:51pm

Hi Ilya,

Hi Wouter,

So, although I understand this ICMP port closed is troublesome, I
do not know how to get rid of it. Is there something I can tell
the kernel that stops the ICMP port closed (for UDP)? Should
unbound listen to raw sockets and somehow remove the packet
destined for an old port (but what if someone runs 'dig' and it
uses a random port that unbound just previously used?).

yes you're right, the "right" fix is really complicated; therefore
it is actually wrong.

We have another suggestion, that may help -- adding some constant
value to the calculated RTT. This will slow the rps rate, but at
least eliminate ICMP flood in cases when there are some
fluctuations in the network that cause answers to arrive a bit
slower. I have tried to find a right place in the code to add this,
but seems I haven't succeed. Could you please help me?

util/rtt.h:
#define RTT_MIN_TIMEOUT 50
util/rtt.c:51:
if(rto < RTT_MIN_TIMEOUT)

And in util/rtt.c:69
rtt_timeout(const struct rtt_info* rtt)
The timeout routine returns the actual timeout that is used to wait
for packets, here you could add +50 msec (if it is smaller than 50).

I understand that this is also a kind of "hack" and it might not be
committed, but we really don't have another choice -- there is no
way to tell if there was a UDP probe attack or late answer from
DNS...

Thank you for your time.

If this works well and does not impact normal users then we could
think to include the fix.

Best regards,
Wouter

Ilya_Bakulin1 · August 12, 2013, 3:49pm

Hi Wouter,

> We have another suggestion, that may help -- adding some constant
> value to the calculated RTT. This will slow the rps rate, but at
> least eliminate ICMP flood in cases when there are some
> fluctuations in the network that cause answers to arrive a bit
> slower. I have tried to find a right place in the code to add this,
> but seems I haven't succeed. Could you please help me?
And in util/rtt.c:69
rtt_timeout(const struct rtt_info* rtt)
The timeout routine returns the actual timeout that is used to wait
for packets, here you could add +50 msec (if it is smaller than 50).

Today I was finally able to test the change.
I have patched rtt_timeout() to return rtt->rto + 50 (ms),
so the Unbound always has this "safe" reserve for the case when
the server starts to lag.
This seems to work wonderful!
If the server doesn't have any problems, the request rate is not affected
at all. If there are lags, now I have nothing in the logs after running my
test scripts. On the unpatched version I get dozens of "UDP probe" messages
from our extreme paranoid kernel, which correspond to outcoming "ICMP
unreach" messages in case of normal setup.
Thank you very much for the suggestion.

If this works well and does not impact normal users then we could
think to include the fix.

Please consider including this fix in some form in the next version of
Unbound, this seems to be easy and effective solution
Maybe make the value tunable from the config file, but you certainly know
better how it suits the concept.