understanding outbound-msg-retry feature

Hello All,

I am exploring the outbound-msg-retry feature, here are my setup details:

Machine#1: running unbound application and used as client machine used for dig queries
Machine#2: where named running and are having records. kept as forward-addr: 10.0.0.240 in conf

Here is my unbound.conf:

The number of retries when a non-positive response is received.

outbound-msg-retry: 5

forward-zone:
name: “.”
forward-addr: 10.0.0.240

forward-addr: 192.0.2.73@5355 # forward to port 5355.

forward-first: no

forward-tcp-upstream: no

forward-tls-upstream: no

forward-no-cache: no

forward-zone:

name: “example.org

forward-host: fwd.example.com

here is how I tested:
on machine#1 ran command **#**dig @127.0.0.1 mx.dnstest.com MX
My expectation is I should see 5 outgoing queries from Machine#1 to Machine#2 as Machine#2 send Serve fail as a response

Test Result:
I see more than 5 outgoing msgs/queries (I see 9 msgs/queries) on Machine#1
This behavior I am not able to understand with definition. I expect only 5 msgs to Machine#2

outbound-msg-retry: *<number>*
            The number of retries Unbound will do in case of  a  non  positive
            response is received. If a forward nameserver is used, this is the
            number of retries per forward nameserver in case of throwaway  re-
            sponse.

Thanks,
Ashok

Hello Ashok!

Here is my unbound.conf:
        # The number of retries when a non-positive response is received.
        outbound-msg-retry: 5
forward-zone:
        name: "."
        forward-addr: 10.0.0.240

First of all the option "outbound-msg-retry" must be configured below a
"server" clause. So your configuration should look like this:

server:
    outbound-msg-retry: 5
forward-zone:
    name: "."
    forward-addr: 10.0.0.240

I am not sure, if you only missed that in your mail or also in the
config file.

here is how I tested:
on machine#1 ran command #dig @127.0.0.1 mx.dnstest.com MX
My expectation is I should see 5 outgoing queries from Machine#1 to Machine#2
as Machine#2 send Serve fail as a response

Test Result:
I see more than 5 outgoing msgs/queries (I see 9 msgs/queries) on Machine#1
This behavior I am not able to understand with definition. I expect only 5
msgs to Machine#2

It is correct that you might see more queries than the one configured
"outbound-msg-retry". Unbound will start to send probes to your
forwarders to measure the round trip time. The round trip distribution
will be used by unbound to decide when to send a second request for the
same request to the same forwarder, in case the UDP packet was been
dropped.

For your testing you an try to set the "infra-cache-min-rtt" to a high
value (something equal/higher than your DNS timeout value, i.e.
something in the some seconds range) and the check if you see less
outgoing queries.

Another way to test this is to send a lot of queries to your forwarders
to let unbound calculate the round trip distribution before sending your
test query.

Hope my explanation helps, since I am also only guessing what might had
happened on your systems.

Kind regards
Moritz

Hi Ashok,

What you expect to happen (only 5 queries) is actually happening if in the forwarding address you put a pure resolver that will give back a single answer to your queries. You can easily test with any validating public resolver and query for bogus.nlnetlabs.nl for example.
If you list more addresses, then it's going to be 5 queries for each address.
At least this is what I also observe here.

By using named in the forward-addr and probably having resolver and nameserver capabilities configured, more packets may be generated between Unbound and named depending on the answers. You can check the queries sent and received with verbosity 4.

Best regards,
-- George

Hello Moritz, George,

Thanks for the response.

Here is what I did after your comments

Setup:
Client(10.0.0.4) → Unbound 10.0.0.6(cache enabled) → Server(named) 10.0.0.240

client tested with dig command for mx.dnstest.com query. this record is having TTL of 31.
I initially flooded the setup with 3000 requests and then I configured named to return servfail to observe retries behavior.
My question is: “Is it really possible to see exact 5 queries (outbound-msg-retry: 5) going out of unbound” ? and how to validate this feature?

Here are Tcpdump captures:

CLIENT:
[root@host-10-218-23-24:Active:Standalone] config # dig @10.0.0.6 mx.dnstest.com MX

; <<>> DiG 9.11.36 <<>> @10.0.0.6 mx.dnstest.com MX
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 30540
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;mx.dnstest.com. IN MX

;; Query time: 3112 msec
;; SERVER: 10.0.0.6#53(10.0.0.6)
;; WHEN: Wed Jul 27 06:23:30 UTC 2022
;; MSG SIZE rcvd: 43

[root@host-10-218-23-24:Active:Standalone] config #

UNBOUND:
06:23:48.624763 IP 192.168.232.180.domain > 10.218.23.35.37028: 36064 5/0/1 A 54.171.230.55, A 34.243.160.129, A 54.217.10.153, A 34.254.182.186, A 54.247.62.1 (124)
06:23:48.624790 IP 192.168.232.180.domain > 10.218.23.35.52430: 17066 5/0/1 AAAA 2a05:d018:91c:3200:2846:99fb:81b6:1e11, AAAA 2a05:d018:91c:3200:c887:2f22:290f:a7c, AAAA 2a05:d018:91c:3200:d8b6:37bc:63f9:703c, AAAA 2a05:d018:91c:3200:5e0d:21a9:26ca:90b5, AAAA 2a05:d018:91c:3200:c8f:1a06:a2dd:450f (184)
06:23:48.624820 IP 192.168.232.180.domain > 10.218.23.35.41585: 51456 5/0/1 A 54.247.62.1, A 54.171.230.55, A 34.243.160.129, A 54.217.10.153, A 34.254.182.186 (124)
06:23:48.624825 IP 192.168.232.180.domain > 10.218.23.35.43194: 24738 5/0/1 AAAA 2a05:d018:91c:3200:c887:2f22:290f:a7c, AAAA 2a05:d018:91c:3200:d8b6:37bc:63f9:703c, AAAA 2a05:d018:91c:3200:5e0d:21a9:26ca:90b5, AAAA 2a05:d018:91c:3200:c8f:1a06:a2dd:450f, AAAA 2a05:d018:91c:3200:2846:99fb:81b6:1e11 (184)
06:23:48.625371 IP clientmachine.novalocal.59871 > 10.0.0.240.domain: 35883+ PTR? 35.23.218.10.in-addr.arpa. (43)
06:23:53.630691 IP clientmachine.novalocal.59871 > 10.0.0.240.domain: 35883+ PTR? 35.23.218.10.in-addr.arpa. (43)
06:24:12.173300 IP 10.0.0.4.62651 > clientmachine.novalocal.domain: 30540+ [1au] MX? mx.dnstest.com. (55)
06:24:12.174582 IP clientmachine.novalocal.8897 > 10.0.0.240.domain: 34128+ [1au] MX? mx.dnstest.com. (43)
06:24:12.225062 IP clientmachine.novalocal.57442 > 10.0.0.240.domain: 59847+ [1au] MX? mx.dnstest.com. (43)
06:24:12.275975 IP clientmachine.novalocal.22614 > 10.0.0.240.domain: 33303+ [1au] MX? mx.dnstest.com. (43)
06:24:12.376605 IP clientmachine.novalocal.50495 > 10.0.0.240.domain: 29263+ [1au] MX? mx.dnstest.com. (43)
06:24:12.477535 IP clientmachine.novalocal.41078 > 10.0.0.240.domain: 64941+ [1au] MX? mx.dnstest.com. (43)
06:24:12.678278 IP clientmachine.novalocal.40026 > 10.0.0.240.domain: 22158+ [1au] MX? mx.dnstest.com. (43)
06:24:12.879194 IP clientmachine.novalocal.63789 > 10.0.0.240.domain: 22982+ [1au] MX? mx.dnstest.com. (43)
06:24:13.280116 IP clientmachine.novalocal.4570 > 10.0.0.240.domain: 16690+ [1au] MX? mx.dnstest.com. (43)
06:24:13.681226 IP clientmachine.novalocal.37294 > 10.0.0.240.domain: 1284+ [1au] MX? mx.dnstest.com. (43)
06:24:14.482445 IP clientmachine.novalocal.21467 > 10.0.0.240.domain: 38521+ [1au] MX? mx.dnstest.com. (43)
06:24:15.284316 IP clientmachine.novalocal.domain > 10.0.0.4.62651: 30540 ServFail 0/0/1 (43)

06:24:27.228645 IP 192.168.232.180.domain > 10.218.20.141.55836: 53616 NXDomain 0/1/0 (112)

NAMED:
06:21:40.024902 IP 10.0.0.6.59871 > 10.0.0.240.53: 35883+ PTR? 35.23.218.10.in-addr.arpa. (43) in slot1/tmm0 lis= port=1.2 trunk=
06:21:45.029881 IP 10.0.0.6.59871 > 10.0.0.240.53: 35883+ PTR? 35.23.218.10.in-addr.arpa. (43) in slot1/tmm0 lis= port=1.2 trunk=
06:22:03.573096 IP 10.0.0.6.8897 > 10.0.0.240.53: 34128+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm1 lis= port=1.2 trunk=
06:22:03.623394 IP 10.0.0.6.57442 > 10.0.0.240.53: 59847+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm0 lis= port=1.2 trunk=
06:22:03.674324 IP 10.0.0.6.22614 > 10.0.0.240.53: 33303+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm3 lis= port=1.2 trunk=
06:22:03.774943 IP 10.0.0.6.50495 > 10.0.0.240.53: 29263+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm0 lis= port=1.2 trunk=
06:22:03.875884 IP 10.0.0.6.41078 > 10.0.0.240.53: 64941+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm3 lis= port=1.2 trunk=
06:22:04.076611 IP 10.0.0.6.40026 > 10.0.0.240.53: 22158+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm2 lis= port=1.2 trunk=
06:22:04.277505 IP 10.0.0.6.63789 > 10.0.0.240.53: 22982+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm2 lis= port=1.2 trunk=
06:22:04.678511 IP 10.0.0.6.4570 > 10.0.0.240.53: 16690+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm2 lis= port=1.2 trunk=
06:22:05.079457 IP 10.0.0.6.37294 > 10.0.0.240.53: 1284+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm1 lis= port=1.2 trunk=
06:22:05.880784 IP 10.0.0.6.21467 > 10.0.0.240.53: 38521+ [1au] MX? mx.dnstest.com. (43) in slot1/tmm1 lis= port=1.2 trunk=

Thanks,
Ashok