Problem with query

Using unbound 1.4.12,

dig -t ns dir.slb.com.

It does not return, it returns instantly against bind. :expressionless:

A few things:

1. That name has a lot of NS answers (7000+ byte reply) according to
ns3.slb.com. It appears to return a truncated answer and then forces
clients (and probably unbound) to retry using TCP.

2. unbound doesn't return. The query runs for hours/days/forever,
inside unbound. It doesn't time-out! Digging into
env->mesh->all.root and seen 100's of answers, and yet no response.
Is it waiting for a COMPLETE answer? Even though it has a huge answer
already?

3. dig to Google (8.8.8.8) goes to tcp and doesn't return an answer either!

4. When this happens, num_addr_replies gets incremented and seems to
never go down! As more stuff comes in for that (or children), the
value of num_addr_replies grows. We know that this value growing
infinitely is bad as there is a 16x limit in the code (against
max_reply_states) before incoming queries get dropped.

5. This looks like it has been a problem in the past:

http://www.unbound.net/pipermail/unbound-users/2010-September/001369.html

Any advice on direction here. Happy to help.

-Rob

Using unbound 1.4.12,

dig -t ns dir.slb.com.

It does not return, it returns instantly against bind. :expressionless:

A few things:

1. That name has a lot of NS answers (7000+ byte reply) according to
ns3.slb.com. It appears to return a truncated answer and then forces
clients (and probably unbound) to retry using TCP.

It works against my unbound-1.4.13 (open to use at 193.110.157.136).
It does fallback to tcp. The dns ns set from hell is returned.

clearly, 2. unbound doesn't return. The query runs for hours/days/forever,
inside unbound. It doesn't time-out! Digging into
env->mesh->all.root and seen 100's of answers, and yet no response.
Is it waiting for a COMPLETE answer? Even though it has a huge answer
already?

Various harden options might make it try a lot of entries before returning.
The only cases I know of unbound not returning an answer is if your loglevel
is so high that your disk cannot keep up with the queries.

3. dig to Google (8.8.8.8) goes to tcp and doesn't return an answer either!

That I see as well.

Paul

Paul,

Are you SURE your server returns? I just tried it with:

dig +time=600 +tcp @193.110.157.136 -t ns dir.slb.com.

And it doesn't return AT ALL. (That is a 10 minute wait time!!)

Various harden options might make it try a lot of entries before returning.
The only cases I know of unbound not returning an answer is if your loglevel
is so high that your disk cannot keep up with the queries.

No logging is on. But, if you turn it on, you'll see unbound working
"forever".

I don't have any "harden" stuff on. I do have:

val-permissive-mode: yes
module: iterator

Are you SURE your server returns? I just tried it with:

dig +time=600 +tcp @193.110.157.136 -t ns dir.slb.com.

And it doesn't return AT ALL. (That is a 10 minute wait time!!)

Seems you are right. An entry in my reslv.conf sneaked through to my bind
fallback server, which does anser with the hunderds of NS records, though
without any additional A records.

I ran: unbound-host dir.slb.com. -t NS -ddddd

but killed it after it had generated 100MB of data and was still looping.
bind does return pretty quickly, though it has no additional records at all.

dig ns dir.slb.com @ns3.slb.com. also shows how bogus that response is.
Many *.dir.slb.com nameservers, but not a single glue record.

I don't have any "harden" stuff on. I do have:

val-permissive-mode: yes

That disables all DNSSEC. Any good reason for that?

Paul

Hi,

Are you SURE your server returns? I just tried it with:

dig +time=600 +tcp @193.110.157.136 -t ns dir.slb.com.

And it doesn't return AT ALL. (That is a 10 minute wait time!!)

Seems you are right. An entry in my reslv.conf sneaked through to my bind
fallback server, which does anser with the hunderds of NS records, though
without any additional A records.

I ran: unbound-host dir.slb.com. -t NS -ddddd

but killed it after it had generated 100MB of data and was still looping.
bind does return pretty quickly, though it has no additional records at
all.

dig ns dir.slb.com @ns3.slb.com. also shows how bogus that response is.
Many *.dir.slb.com nameservers, but not a single glue record.

Yes, it has 283 nameserver entries and 280 addresses (that I can find).
I have tried them, but they do not reply. They time out.

So what happens is that unbound quietly starts probing this very long
list. It will take some time to do this. If space becomes a problem,
this query is the oldest and gets removed.

You say that bind returns. How does it get an answer? None of the IPs
associated with the domain return UDP replies. Perhaps it returns the
NS set from the referral as the answer? Unbound refuses to do this for
security reasons.

I don't have any "harden" stuff on. I do have:

val-permissive-mode: yes

That disables all DNSSEC. Any good reason for that?

Best regards,
   Wouter

Hi