internal error: looping module stopped

Hello,

While inspecting unbound logs after an upgrade to Unbound 1.6.0-3+deb9u1 I have found some errors like this:

Sep 19 12:40:45 unbound[16337:22] error: internal error: looping module stopped
Sep 19 12:40:45 unbound[16337:22] error: internal error: looping module stopped
Sep 19 12:41:00 unbound[16337:21] error: internal error: looping module stopped
Sep 19 12:41:00 unbound[16337:21] error: internal error: looping module stopped
Sep 19 12:41:01 unbound[16337:4] error: internal error: looping module stopped
Sep 19 12:41:01 unbound[16337:4] error: internal error: looping module stopped
Sep 19 12:41:46 unbound[16337:2e] error: internal error: looping module stopped
Sep 19 12:41:46 unbound[16337:2e] error: internal error: looping module stopped
Sep 19 12:42:32 unbound[16337:29] error: internal error: looping module stopped
Sep 19 12:42:32 unbound[16337:29] error: internal error: looping module stopped
Sep 19 12:43:11 unbound[16337:2a] error: internal error: looping module stopped
Sep 19 12:43:11 unbound[16337:2a] error: internal error: looping module stopped

I have six instances of Unbound and I can see it on all of them. The load is up to 500qps, DNSSEC enabled, use-caps-for-id enabled, IPv6.

Thanks

With regards
Ales

Hi Ales,

Thanks for reporting. Do you have any non-default module enabled
(subnetcache, ipsecmod, respip, ..)?

-- Ralph

I also got a number of these errors on an upgrade to Unbound 1.6.6, on a
server doing about 100-300 qps. Similar config: use-caps-for-id, IPv6,
DNSSEC. No non-default modules enabled. I also have a test harness where
I can reproduce these. What additional information would be useful?

Hi Jacob,

I would like to know if this error also shows up when caps-for-id is not
enabled.

It might also be useful to know on which query name the looping occurs.
This will be logged on verbosity 3. If setting your verbosity to 3 is
not possible, you can apply below patch on 1.6.6 to always log the query
when the module is looping, and to log the module in which the looping
is happening.

Thanks,
-- Ralph

Index: services/mesh.c

I tested without use-caps-for-id, and did not get the "looping module
stopped" error.

I also upgraded to 1.6.7 and applied your patch. Here is a sample of
some of the queries and log messages I got. I was loading Unbound with a
mixture of A and CAA queries pulled from the logs of all names in
certificates issued by Let's Encrypt.

Oct 14 18:43:22 unbound[8613:0] error: internal error: looping module
(iterator) stopped
Oct 14 18:43:22 unbound[8613:0] info: pass error for qstate
autodiscover.azhany.com. A IN

Hi all,

I have just tested domains mentioned by Jacob below in order to reproduce this
issue on my system - unbound 1.6.5-1, Debian, use-caps-for-id, validator
enabled. Though I have some occurrences of looping module stopped error too,
none of the domains caused this error.

Ales

That makes sense to me. I think the error is not specific to any of the
domains, but is related to an issue that manifests over time due to load.

Any further thoughts on what could be behind this issue?

Thanks,
Jacob

Hi Jacob,

Not really, I haven't yet been able to figure out how use-caps-for-id
results in a looping module. I took a look at your log for bug #1749,
but the looping doesn't seem to occur there.

You mentioned that you are able to reproduce the issue. Can you maybe
share a log where the looping happens, while using a verbosity of 4?
Maybe that will give more insight.

Thanks,
-- Ralph

I tried doing this previously, but found that verbosity of 4 slowed down
Unbound so much that it could not handle the load I was sending it for
long enough to reproduce the problem. I can try again with a lower log
level, but I suspect the problem may not reproduce.

Thanks,
Jacob