"Blocking regime" behavior question

Hello,

I posted a query a bit earlier regarding Unbound timeout behavior.

Still have one mystery remaining w/r/t the Tor Exit DNS DOS failure described.

At the time of this event, the 'dump_requestlist' output for one out
of two threads contained 350 entries for timing-out resolve requests
to GoDaddy, which was null-routing the exit-node at that time. This
implies a total of about 700 timing-out resolve requests. The
"Blocking" section in the above-referenced documentation seems to
indicate that Unbound should have marked the SOA server unreachable
and immediately returned SERVFAIL for these requests. Shouldn't this
have limited the size of the request queue/list to something far less
than 700 entries? Also have a synchronous 'dump_infra' from the same
incident. For one of the GoDaddy name servers the infra list had
sixty entries similar to the ones listed below. Version running at
the time was 1.54 built with libevent.

Am I misunderstanding the documented behavior? Does Unbound both
respond with SERVFAIL _and_ continue trying to resolve each request?

Thank You

request sample (out of 348 total)

thread #0
# type cl name seconds module status
117 A IN hvymca.cOM. 15.725693 iterator wait for 216.69.185.49
126 A IN ZUtOot.cOM. 299.954939 iterator wait for 216.69.185.49
155 A IN hYtorcZA.COm. 197.524971 iterator wait for 216.69.185.49
162 A IN GoLdNiQuE.cOm. 101.887861 iterator wait for 216.69.185.49

infra sample (out of 60 total)

216.69.185.49 ARtbooKscoffeE.coM. ttl 464 ping 0 var 94 rtt 376 rto
12032 tA 3 tAAAA 0 tother 0 ednsknown 0 edns 0 delay 0 lame dnssec 0
rec 0 A 0 other 0
216.69.185.49 HOLoGrAphicbroADcAStiNg.cOm. ttl 526 ping 0 var 94 rtt
376 rto 120000 tA 3 tAAAA 0 tother 0 ednsknown 0 edns 0 delay 0 lame
dnssec 0 rec 0 A 0 other 0
216.69.185.49 OnLInEleBANEsE.com. expired rto 120000
216.69.185.49 HuNGRyRIDgeKeNNeL.cOm. ttl 877 ping 0 var 94 rtt 376 rto
12032 tA 3 tAAAA 0 tother 0 ednsknown 0 edns 0 delay 10 lame dnssec 0
rec 0 A 0 other 0
216.69.185.49 huckpiE.COm. expired rto 120000
216.69.185.49 Webisfree.com. expired rto 120000
216.69.185.49 koBeJoRdanBrANd.Com. expired rto 120000
216.69.185.49 GroDno.neT. expired rto 120000
216.69.185.49 hYtorcZA.COm. ttl 670 ping 0 var 94 rtt 376 rto 96256 tA
3 tAAAA 0 tother 0 ednsknown 0 edns 0 delay 66 lame dnssec 0 rec 0 A 0
other 0
216.69.185.49 iBGEEKbOY.CoM. ttl 665 ping 0 var 94 rtt 376 rto 96256
tA 3 tAAAA 0 tother 0 ednsknown 0 edns 0 delay 43 lame dnssec 0 rec 0
A 0 other 0
216.69.185.49 y-AnD-o-jeWelRy.cOM. expired rto 120000
216.69.185.49 ZUtOot.cOM. ttl 579 ping 0 var 94 rtt 376 rto 96256 tA 3
tAAAA 0 tother 0 ednsknown 0 edns 0 delay 73 lame dnssec 0 rec 0 A 0
other 0
216.69.185.49 expRessIONS-dental.COm. ttl 432 ping 0 var 94 rtt 376
rto 24064 tA 3 tAAAA 0 tother 0 ednsknown 0 edns 0 delay 0 lame dnssec
0 rec 0 A 0 other 0
216.69.185.49 freesite.work. expired rto 120000
216.69.185.49 veCodC.cOM. expired rto 120000
216.69.185.49 hogs4Dogs.CoM. ttl 54 ping 0 var 94 rtt 376 rto 120000
tA 3 tAAAA 0 tother 0 ednsknown 0 edns 0 delay

Ah, I see GoDaddy has something like eighty name servers and that any
one domain has NS records pointing to just two of them. Depending on
how requests are grouped it could lead to several servers being
"blocked" and several others being left "unblocked" and more than a
few active requests on the queue. Is that the explanation? Or is
there more to it?

A quick experiment shows that if an infra-cache entry with "rto
120000" (two minutes) is present for all a requested domain's name
servers, then SERVFAIL is returned immediately to further queries.
However, the infra-cache entries are per-domain-per-nameserver. When
a different domain with a NS record pointing to the same nameserver is
queried, Unbound creates and tracks a separate state for the second
domain and does not combine information regarding reachability of
particular nameservers.

So I suppose queries against eighty-plus unresponsive name servers for
hundreds-to-thousands of domains easily explains the 700 entry active
request queue.

One odd quirk is that Unbound sometimes returns SERVFAIL in from ten
to thirty seconds the first time a request for a non-responding
nameserver domain is made. Subsequent requests then take however long
until the "rto 120000" state is achieved for the infra-cache domain
entry. Maximum time till SERVFAIL seems to be ten minutes.

'Eventdns' is now tuned to give up after five seconds and permit up to
16k under-five-seconds active requests, so all the above is academic.