Forwarder behavior, infra-cache interaction, and feature ideas

Hello,

We are currently using Unbound (v1.24.2) as a local cache and
forwarder to regional DNS recursors, and I would appreciate some
feedback on a few observations and potential improvements, before
opening a feature request on Github.

**Context**

Our typical query path is application -> local Unbound instance listening on 127.0.0.1 -> regional DNS recursor -> Internet.

We use a configuration similar to:

forward-zone:
    name: "."
    forward-addr: X.X.X.X
    forward-addr: X.X.X.Y
    forward-first: no

Depending on the region we can have from 2 to 6 DNS recursors to
configure. In smaller regions, we may rely on forwarders from another
region over a private VPN.

For information, we use the following other settings:

server:
    qname-minimisation: no
    do-ip6: no

    #
    # Performance tuning
    #
    cache-max-negative-ttl: 60
    outbound-msg-retry: 1
    infra-cache-min-rtt: 1000

    # Serve expired entries before they are refreshed in cache
    serve-expired: yes
    serve-expired-ttl: 300

    # cache size tuning
    msg-cache-size: 128m
    rrset-cache-size: 256m

    auto-trust-anchor-file: "/var/lib/unbound/root.key"

On the client side, applications use a 6s timeout with up to 3 retries
(no retry on SERVFAIL).

**Observations / Issues**

1. Infra cache behavior with forwarders

From my understanding, the Infra cache mechanism monitors DNS response
time and will select the best ones based on latency within a band of
400ms. Also, the timeout is automatically calculated based on answer
received. (source:
NLnet Labs Documentation - Unbound - Unbound Timeout Information). This
makes sense when querying authoritative servers for a zone.

However, in a forwarder setup, RTT variability is often driven by the
queried domain rather than the forwarder itself. As a result,
forwarders may be penalized due to slow domains rather than actual
network latency.

2. Lack of prioritization between forwarders

In our setup, some forwarders are local (same region) and others are
remote (over VPN). Ideally, we would prefer local forwarders and only
fall back to remote ones when needed.

**Ideas / Possible Improvements**

These are exploratory suggestions:
    • Ability to define a fixed timeout for forwarders, bypassing
automatic RTT-based adjustments.
    • Option to disable infra-cache-based selection for forwarders
(while still detecting unresponsive ones).
    • Optional round-robin strategy instead of latency-based selection.
    • Support for prioritization between forwarders (e.g., prefer
local over remote).
    • (Optional) Ability to probe forwarders using a specific domain
to assess availability/latency.

Example of configuration (illustrative only):

forward-zone:
  name: "."
  forward-addr: X.X.X.X%10  # %10 means a priority of 10
  forward-addr: X.X.X.Y%20  # %20 means a priority of 20. Will be used
only if all forwarders with a lower priority could not answer
  timeout: 200              # Define the timeout, in milliseconds
  infra-cache-disable: true # Disable all the mechanism of
infra-cache. (Is defining parameters above enough?)

**Questions**

Does this interpretation of infra-cache behavior in a forwarder setup
seem accurate?
Are there existing configuration options that already address some of
these needs?
Would these ideas be appropriate for a feature request on GitHub?

Thank you for your feedback

Hi @couloum,

It does, although I would expect that there is caching available on the forwarder. Slow queries would indeed need more RTT to resolve but RTT calculation is dynamic together with all the other queries that would get cached answers or fast resolution.

There are no existing configuration options per forward/stub-zone.
This feature is part of our internal roadmap (it has been asked for before) and this feedback helps with shaping it up.

Thank you @Yorgos (and sorry for the duplicate post, I did not see this one after the migration).

Should I open a github issue or this post is enough?

No worries.

Yes a GitHub issue would be more visible for developing purposes.

I have opened this github issue: [FR] configurable forwarder behavior for forward-zone · Issue #1435 · NLnetLabs/unbound · GitHub

I don’t know if I can do it myself, but this discussion can be closed in favor of the github issue.