We have just started using unbound and I am having an issue with resolving reddit.com do to some bad queries hitting our servers.
The bad queries are for 'http://www.reddit.com.' The colon in that name causes reddit.com's servers to not respond to the query. At some point unbound marks the whole domain reddit.com as failing and returns SERVFAIL for all queries. This clears after a bit and then repeats.
I have filtered out the bad queries to stop the immediate problem.
I am looking for a more robust way to fix this issue.
15 A IN http???www.reddit.com. 221.211696 iterator wait for
173.245.58.24 22 AAAA IN http???www.reddit.com. 0.097014 iterator
wait for 198.41.222.24
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Unbound notices the domain does not respond to A queries. And marks
the domain as timeouted, down, for A queries. Unbound stops sending A
queries there to attempt to trottle down traffic towards that stricken
server. If A queries get replies (there is an exponential backoff to
the queries sent out) then unbound marks the server as responsive
again (the server is considered back up) and queries are resumed.
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same (drop)
When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
If not FORMERR, they could've at least send ICMP administratively
prohibited to mark that this particular comms is not ok with them.
That would've made unbound record a failure.
It's silly because in order to immunize your cache against this you
would have to start your own filtering... That shouldn't be the point.
Unbound notices the domain does not respond to A queries. And marks
the domain as timeouted, down, for A queries. Unbound stops sending A
queries there to attempt to trottle down traffic towards that stricken
server. If A queries get replies (there is an exponential backoff to
the queries sent out) then unbound marks the server as responsive
again (the server is considered back up) and queries are resumed.
Is there any unbound-control command to help in this situation? i.e.
manually override the backoff or reset it? Would flush_type or
flush_name help?
> Yes. The reddit servers (or likely, their load-balancers) are not
> following the DNS specifications. They are dropping the query and
> they should be replying. There was a draft at the IETF even to mark
> this as harmful, but it did not progress through the standards track,
> I believe. If they want to refuse the query for unclear reasons (what
> is wrong with responding NXDOMAIN?) they could choose from nice error
> codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same (drop)
When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
So I tried Dyn, they respond with NXDOMAIN.
I also tried DNSMadeEasy they respond with NXDOMAIN.
I noticed when the domain has a wildcard they respond with the A-record.
I then checked a PowerDNS server, they respond with SERVFAIL even when the domain has a wildcard.
I saw this about 2 weeks ago initially. I was able to track down the same you found. I was able to mitigate this issues by putting a "bad" answer back so we do not forward the "bad" query to Cloud Flare (I've alerted them).
Not a customer of Cloudflare but their help system allows outsiders to
submit so I have submitted a help request for this problem (172999).
Maybe this is a bug.
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same (drop)
When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
If not FORMERR, they could've at least send ICMP administratively
prohibited to mark that this particular comms is not ok with them.
That would've made unbound record a failure.
It's silly because in order to immunize your cache against this you
would have to start your own filtering... That shouldn't be the point.
Not a customer of Cloudflare but their help system allows outsiders to
submit so I have submitted a help request for this problem (172999).
Maybe this is a bug.
Cloudflare's response:
Hey there,
Because the DNS query "http://reddit.com" is technically not valid (since DNS queries should not contain the protocol URI), CloudFlare's DNS servers will not respond to them.
Since these kinds of invalid queries don't get this far in the normal DNS system (since they get dropped at the root servers)
Yes. The reddit servers (or likely, their load-balancers)
are not following the DNS specifications. They are dropping
the query and they should be replying. There was a draft at
the IETF even to mark this as harmful, but it did not
progress through the standards track, I believe. If they
want to refuse the query for unclear reasons (what is wrong
with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just
asked cloudflare NSes for a name with a colon and it behaves
the same (drop) When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
If not FORMERR, they could've at least send ICMP
administratively prohibited to mark that this particular comms
is not ok with them. That would've made unbound record a
failure.
It's silly because in order to immunize your cache against this
you would have to start your own filtering... That shouldn't be
the point.
Not a customer of Cloudflare but their help system allows
outsiders to submit so I have submitted a help request for this
problem (172999). Maybe this is a bug.
Cloudflare's response:
Hey there,
Because the DNS query "http://reddit.com" is technically not
valid (since DNS queries should not contain the protocol URI),
CloudFlare's DNS servers will not respond to them.
Since these kinds of invalid queries don't get this far in the
normal DNS system (since they get dropped at the root servers)
Let us know if you need any other help Thanks
*sigh*
The root servers certainly respond. I got a very neat referral to .com.
Well, they list "http://reddit.com" which is a dotCOM domain with a
colon in it, that stops somewhere at the .com servers. And does not
reach CloudFlare, so they are right about that one.
But the trouble is with "http://www.reddit.com" because the DNS
servers for 'reddit.com' do not respond for it.
That is what I would have predicted their response would have been. A
broken client is making illegal DNS queries; that is the root cause of
the difficulty. The fact that unbound itself doesn't return an error
for these illegal queries is only making matters worse. Neither ':' nor
'/' are legal DNS hostname characters (see RFC-1035 and onwards), so it
should be the resolver library (i.e. unbound) that should be validating
the query before sending it on, IMNSHO. The fact that reddit.com has an
unfriendly behavior WRT illegal queries doesn't mean it is their fault;
there is no requirement to return NXDOMAIN or SERVFAIL or anything at
all, so they chose to drop the query.
There is! Not answering a query is indistinguishable from packet loss,
forcing the client to re-send the query. So it is the wrong thing to
do, and will increase the number of these bad queries hitting their
servers.
I alerted Cloud Flare last week and they have indicate they have engineers looking into it. I opened the ticket as a DOS against any domains they provide hosing for. As long as there are clients querying 'http://www.reddit.com' (or any other cloud flare hosted domain) it can keep that domain offline. Our work-around as allowed reddit.com to appear to remain online.
That is good to hear. I was thinking I was getting a first line response to the issue since it was so quick. I probably didn't explain it well enough. I will try again. More tickets may help push it up on their priority list.
Thanks for your response John, it's very appreciated. Perhaps FORMERR
is more suited, but NXDOMAIN is true in this case as well and better
suited than the drop. Can you let the list know when it's done,
please?
Friends, the fact that I found the issue Dave reported in CloudFlare
doesn't mean it's not existing elsewhere.
I mean this should close Dave's case because cns[123].reddit.com are CloudFlare.
But this sort of thing can happen a lot with many IDS-type products
that do deep packet inspection and filtering.
Just to be complete this story, the iOS app in question is Alien Blue. When the app is not being used, these queries pop up about every 5 minutes. When you use the app, queries are normal. I have notified the developer.