Potential Bug: In "mesh_state_cleanup", I noticed that
"comm_point_drop_reply" is called to remove unset replies, HOWEVER, it
does not appear that either "stats_dropped" is incremented or more
importantly "num_reply_addrs" is not decremented. Doesn't this lack
of a decrement potentially cause the "16 times" limit go into effect and
prematurely drop queries?
Your response:
The num_reply_addrs is decremented when a reply is sent (which can be an
error if the mesh state is removed because of an error). The
mesh_state_cleanup indeed does not cleanup the num_reply_addrs value
correctly, but this routine is only called from the mesh_state_delete
function. Thus it does not present a bug.
Okay, I agree.
Consider this: Assume my Internet connection goes out for a bit, thus
no replies come
in, so, queries pile up quickly.
Consider the following code path during this outage:
mesh_new_client() -> mesh_make_new_space() -> Code in
mesh_make_new_space() decides jostle is needed ->
mesh_state_delete() -> mesh_state_cleanup()
That causes num_reply_addrs to NOT be decremented.
Now my Internet connection comes back, BUT, num_reply_addrs got so
high that the "16 times" limit goes into effect. Now I get all my
queries dropped even though there is no error or problem anymore.
My temporary outage caused num_reply_addrs to grow and there is no way
for it to come back down. So, I'm stuck dropping queries until a
restart of the server.
Zitat von "W.C.A. Wijngaards" <wouter@NLnetLabs.nl>:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Robert,
Hi Robert,
I said:
Potential Bug: In "mesh_state_cleanup", I noticed that
"comm_point_drop_reply" is called to remove unset replies, HOWEVER, it
does not appear that either "stats_dropped" is incremented or more
importantly "num_reply_addrs" is not decremented. Doesn't this lack
of a decrement potentially cause the "16 times" limit go into effect and
prematurely drop queries?
Your response:
The num_reply_addrs is decremented when a reply is sent (which can be an
error if the mesh state is removed because of an error). The
mesh_state_cleanup indeed does not cleanup the num_reply_addrs value
correctly, but this routine is only called from the mesh_state_delete
function. Thus it does not present a bug.
Okay, I agree.
Consider this: Assume my Internet connection goes out for a bit, thus
no replies come
in, so, queries pile up quickly.
Consider the following code path during this outage:
mesh_new_client() -> mesh_make_new_space() -> Code in
mesh_make_new_space() decides jostle is needed ->
mesh_state_delete() -> mesh_state_cleanup()
That causes num_reply_addrs to NOT be decremented.
Now my Internet connection comes back, BUT, num_reply_addrs got so
high that the "16 times" limit goes into effect. Now I get all my
queries dropped even though there is no error or problem anymore.
My temporary outage caused num_reply_addrs to grow and there is no way
for it to come back down. So, I'm stuck dropping queries until a
restart of the server.
You are correct, and I was wrong. It should be fixed.
Thanks for the report (and persistence )
Fixed in the svn trunk development of unbound.
This code path is triggered by jostled queries. It is not triggered by
ordinary timeouts or servfails.