Monitoring unbound?

heidnes · November 11, 2025, 3:34pm

Hi,

at work we like monitoring our recursive resolvers. To that end
we use the collectd package, and our backend is (judging by the
name) graphite (can you tell I don't "own" it myself?), while
presentation is via grafana.

A long time ago I found this external plugin for collectd,

https://github.com/falzm/collectd-unbound

which was apparently archived in 2021, no longer being developed.
Of course, I have a few patches to it... This one in turn
depends on the external-to-collectd API presented by

https://github.com/collectd/go-collectd

All this wrapped up in NetBSD's pkgsrc's patch + build system.

Now, the sad part is that these are (for now) un-committed pkgsrc
packages (yes, I'm looking to fix that), and the build of them
has apparently fallen by the wayside with recent-ish updates --
the latter now apparently depends on the collectd-dev (Debian?)
package, and collectd itself fails to install one of the header
files it wants to include, ref.

   # collectd.org/plugin/fake
   plugin/fake/shutdown.go:6:11: fatal error: plugin.h: No such file or directory
       6 | // #include "plugin.h"
   > ^~~~~~~~~~
   compilation terminated.

Now, before I go on to report these issues to the respective
parties (I'll probably do that anyway...), I'd like to ask what
others do in terms of monitoring and visualization of the
monitored values for unbound. Surely something less rickety has
been put together? Bonus points for integration with graphite
and grafana.

Best regards,

- Håvard

Tarko_Tikan · November 11, 2025, 3:38pm

hey,

Now, before I go on to report these issues to the respective
parties (I'll probably do that anyway...), I'd like to ask what
others do in terms of monitoring and visualization of the
monitored values for unbound.

Prometheus and https://github.com/letsencrypt/unbound_exporter

Ideally unbound would expose the metrics directly via HTTP in prometheus format. As NSD got native support earlier in 2025, there is some hope I guess

Carsten_Strotmann · November 11, 2025, 3:56pm

Hi Havard,

Robert_Blayzor · November 11, 2025, 4:03pm

I've used a combination of NetSNMP snmpd running on the host and Cacti for collecting status. (cache hits, misses, queries, etc)

Basically snmpd uses a tiny shell one liner to grab the values from unbound-control stats output.

Since it's SNMP you can just plug them into anything you want to grab the values.

Phil_Porada · November 11, 2025, 4:04pm

You should check out the unbound_exporter which we took over development from Kumina some years ago. We’ve been using it to great effect ever since. https://github.com/letsencrypt/unbound_exporter

Nicolas_Baumgarten · November 11, 2025, 6:50pm

Hi!
we are using collectd and graphite for unbound and other metrics

In the case of unbound we have an old perl wrapper around “unbound-control stats_noreset”

heidnes · November 11, 2025, 9:36pm

Hi!
we are using collectd and graphite for unbound and other metrics

In the case of unbound we have an old perl wrapper around
"unbound-control stats_noreset"

OK.

Allow me to follow a tangent, and get up to ride a hobby horse of
mine... Our unbound.conf has

# don't zero stats on read(!)
statistics-cumulative: yes

It is beyond me why anyone would have it any other way.

What if you wanted to have two distinct monitoring setups looking
at the same unbound instance? A counter is a counter!

Maybe I'm too much influenced by dealing with SNMP in other parts
of my work (but I hope not).

Regards,

- Håvard

Seth_Van_Buren · November 11, 2025, 10:32pm

We use AWS Cloudwatch with some Dasboards. Our entire solution is in AWS.

pemensik · November 21, 2025, 11:02am

Such topic were recently started also on BIND9. If you can document what different statistics is in use now, it might be used to create one common format used by any DNS service. It is silly when each have different format, which can be transformed by some external module into format common on some monitoring service.

I am not sure unbound should offer it on HTTP service socket, but it would be great if it could provide general numbers in common format. If it had some numbers different from other implementations, export only those in implementation specific extension. But I think majority of DNS software has similar numbers they want to watch.

PaulWouters · November 21, 2025, 3:02pm

Do you mean one of these:

unbound-control stats_noreset
unbound-control stats

Paul

pemensik · November 21, 2025, 5:53pm

No. I think that was question on some conference, DNS-OARC perhaps.

The proposal was what if bind9, unbound, knot-resolver and pdns-recursor could create the same format for their statistics. So prometheus could have only one statistics parser code. It might be exported to different path in filesystem and that should be enough. Only path and content should be different for different services. Format should ideally stay compatible. Then it would require less code as glue between statistics dashboards used and the DNS service itself.

I think such common format would be great. I would prefer something json based. I can describe only bind9 and unbound statistics. Their format is very different, although quite a lot numbers could be similar.

This is main statistics refactoring issue at bind9

https://gitlab.isc.org/isc-projects/bind9/-/issues/38

I am not sure where exactly did they talk about requirements for a new format, sorry. I think it was mentioned after some talk at some OARC recording, but do not remember which one.

Such topic were recently started also on BIND9. If you can document what different statistics is in use now, it might be used to create one common format used by any DNS service. It is silly when each have different format, which can be transformed by some external module into format common on some monitoring service.

I am not sure unbound should offer it on HTTP service socket

Do you mean one of these:

unbound-control stats_noreset
unbound-control stats

Paul

Yes. It would be nice, if it could serve only subtree. Hmm, would be cool, if it could serve CH answer.num.stats. TXT? query? Although that would need some ACL protection.

The then you want to put this into some graph usually. That needs to pick specific fields from these and map them to some graph lines. Every implemetation seems to have very different statistics output format.

Systemd people like Varlink format. That could be usable too.

Petr

PaulWouters · November 21, 2025, 6:23pm

No. I think that was question on some conference, DNS-OARC perhaps.

The proposal was what if bind9, unbound, knot-resolver and pdns-recursor could create the same format for their statistics.

Ahh so then perhaps you should write an IETF Yang module for this and then to export it to yang.

Paul

Robert_Edmonds · November 21, 2025, 10:36pm

I don't think BIND, Unbound, Knot Resolver, and PowerDNS Recursor should
generate identical statistics. It would be nice if they used a de facto
standard like Prometheus/OpenMetrics format exposed on an HTTP endpoint
so that their metrics can be scraped and ingested by modern observability
stacks. Currently Unbound requires deploying a third party daemon [0]
alongside Unbound to convert the bespoke "UBCT1" protocol (the protocol
that unbound-control speaks to the Unbound daemon) in order to ingest
Unbound's metrics into a Prometheus-compatible stack.

There are some metrics that count the number of times certain kinds of
packets occur (queries/responses by QTYPE/OPCODE/RCODE, by transport,
etc.) where you can arguably find some level of commonality between
different DNS server implementations because they are just counting
objective, externally observable events. If you are restricting your
visibility to just these externally observable properties of DNS
transactions, then perhaps it might be possible to share glue code and
"statistics dashboards" between different DNS server implementations. But
this is a fairly basic level of visibility. DNS server implementations are
going to have diverse internal architectures and implementation details
and some level of visibility (or "observability") into the health of those
internal implementation details is highly desireable.

For instance, I care very much about why Unbound might have dropped a
query from a client. It's not very useful or actionable to have a single
"number of client queries that were dropped" metric in the DNS server that
aggregates every cause together. (All this tells me is that the query got
to the server and I can exclude external possibilities like socket receive
buffer overruns from the possible causes.) You need more fine grained
metrics that let you track down what mechanism(s) resulted in the query
drops. So Unbound has been getting more fine grained metrics like [1, 2]
that help explain exactly which implementation specific mechanisms are
resulting in query drops.

It would be unreasonable to expect every implementation to have the same
metrics like the ones in [1, 2], because these are implementation-specific
details that are going to vary because different implementations take
different approaches to solving various problems. It would also be
unreasonable to just take a union of all such implementation-specific
metrics and add them to a single common format and just have
implementations omit the ones that aren't relevant to them. (Or, even
worse, have different implementations use the same metric names to mean
totally different things, or sort of similar but not really the same
things.)

So my recommendations are basically:

1) Don't innovate on the exposition format. DNS servers exist
in a universe with many other kinds of servers that have had to
deal with broadly similar issues and this is not a greenfield.
Prometheus/OpenTelemetry already exists. If you come up with a bespoke
XML, JSON, protobuf, etc. format, it will have to be converted to
something else in order to import it into modern observability stacks, so
just generate that format directly. (If you disagree, then by all means,
design an additional layer of internal abstraction and build a pluggable
module interface so you can support a multitude of different metrics
exposition formats/transports.)

2) Every vendor should come up with their own naming scheme, organization,
and definition of implementation-specific "health" metrics that fit their
own software most naturally.

3) There may be some value in regularizing across server implementations
the definitions of metrics that count externally visible properties of DNS
transactions (the QTYPEs and RCODEs, etc.). But this kind of effort should
be narrowly scoped to exclude the implementation-specific "health"
metrics.

[0]: https://github.com/letsencrypt/unbound_exporter

[1]: https://github.com/NLnetLabs/unbound/pull/1159

[2]: https://github.com/NLnetLabs/unbound/pull/1374