Issues setting up dnstap logging

Hello,

I am curious if anyone is running a successful dnstap logging setup for
unbound and what tools you are using for solving this.

Currently I am playing around with this on a debian buster machine with
backports enabled so I can install unbound 1.12. My goal is to have binary
dnstap files being created on disk, with logfile rotation over some interval so
you can clean up older files over time.

For my initial attempt I tried using fstrm_capture since it is already
available from the "fstrm-bin" package in buster and also supports logfile
rotation. I have noticed however that fstrm_capture will often crash when
stopping unbound.

Based on the man page example I start the process like so:

sudo -u unbound /usr/bin/fstrm_capture -ddddd -t protobuf:dnstap.Dnstap \
                 -u /var/lib/unbound/dnstap.sock \
                 -w /var/log/dnstap/dnstap-%F-%T.fstrm \
                 -s 3600 --gmtime

This is coupled with the following dnstap-related config for unbound:

dnstap:
    dnstap-enable: yes
    dnstap-socket-path: /var/lib/unbound/dnstap.sock
    dnstap-log-client-query-messages: yes
    dnstap-log-client-response-messages: yes

Given that unbound is not running at the startup of fstrm_capture the following
is logged:

# ./run-fstrm_capture.sh
fstrm_capture: opening Unix socket path /var/lib/unbound/dnstap.sock
fstrm_capture: opened output file /var/log/dnstap/dnstap-2020-12-27-17:05:32.fstrm

When starting unbound the following output is observed:

fstrm_capture: accepted new connection fd 8
fstrm_capture: connection fd 8: reading control frame (42 bytes): "\x00\x00\x00\x00\x00\x00\x00\"\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x16protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_READY (4)
fstrm_capture: connection fd 8: CONTENT_TYPE [1/1] (22 bytes): "protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_ACCEPT (1)
fstrm_capture: connection fd 8: writing frame (42) bytes: "\x00\x00\x00\x00\x00\x00\x00\"\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x16protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: reading control frame (42 bytes): "\x00\x00\x00\x00\x00\x00\x00\"\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x16protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_START (2)

Looks good so far, and querying the server for nlnetlabs.nl will result in the
following output:

fstrm_capture: connection fd 8: processing data frame (90 bytes)
fstrm_capture: connection fd 8: processing data frame (94 bytes)

The problem I have seen here is if I stop (or restart) unbound, the
fstrm_capture process will often die, outputting the following before exiting:

fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"

I thought this was always the case, but while collecting output for this email
I did see an instance where it did not die, logging some more and remaining running:

fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"
fstrm_capture: connection fd 8: closing (read 6 frames, 552 bytes)

I was confused by the exiting beaviour at first, it looks like the process just
disappears for no obvious reason but after attaching gdb I saw the following:

fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"

Program received signal SIGPIPE, Broken pipe.
0x00007ffff7c74504 in __GI___writev (fd=8, iov=0x7fffffffe010, iovcnt=1) at ../sysdeps/unix/sysv/linux/writev.c:26
26      ../sysdeps/unix/sysv/linux/writev.c: Permission denied.
```

So it appears that when fstrm\_capture attempts to write the FSTRM\_CONTROL\_FINISH frame, unbound will
often have already gone away, leading to an unhandled SIGPIPE which probably
defaults to terminating the process, based on signal\(7\):
```
Signal     Value     Action   Comment
──────────────────────────────────────────────────────────────────────
[...]
SIGPIPE      13       Term    Broken pipe: write to pipe with no
                                     readers; see pipe(7)
```

My assumption that the signal is unhandled is based on setup\_event\_pool\(\) at
https://github.com/farsightsec/fstrm/blob/master/src/fstrm_capture.c#L1147 having
explicit handlers for SIGHUP and SIGUSR1, but no mention of SIGPIPE\.

Looking at an alternative reader I also tried the golang based dnstap
application from https://github.com/dnstap/golang-dnstap, running it like so:
```
sudo -u unbound /usr/local/bin/dnstap \
                 -u /var/lib/unbound/dnstap.sock \
                 -w /var/log/dnstap/golang-dnstap.fstrm
```

It does indeed also see SIGPIPEs but manages to survive them\. Starting it:
```
# ./run-dnstap.sh
dnstap: opened input socket /var/lib/unbound/dnstap.sock
```

\.\.\. starting unbound:
```
2020/12/27 18:42:12 /var/lib/unbound/dnstap.sock: accepted connection 1
```

\.\.\. stopping unbound:
```
2020/12/27 18:43:08 FrameStreamInput: Read error: write unix /var/lib/unbound/dnstap.sock->@: write: broken pipe
2020/12/27 18:43:08 /var/lib/unbound/dnstap.sock: closed connection 1
```

\.\.\. starting it again:
```
2020/12/27 18:43:40 /var/lib/unbound/dnstap.sock: accepted connection 2
```

\.\.\. and restarting unbound again for the combined experience:
```
2020/12/27 18:44:06 FrameStreamInput: Read error: write unix /var/lib/unbound/dnstap.sock->@: write: broken pipe
2020/12/27 18:44:06 /var/lib/unbound/dnstap.sock: closed connection 2
2020/12/27 18:44:06 /var/lib/unbound/dnstap.sock: accepted connection 3
```

So it appears more resilient, but unfortunately it has no native support for
rotating the output files, and I am not sure of a proper way of doing this while
not missing queries\.

For this reason I am wondering if someone out there have already solved these
issues and have some pointers how to properly set this up\. I guess it is also
possible that the SIGPIPE problem indicates a shortcoming either in
fstrm\_capture or unbounds shutdown handling, but I am not sure who if anyone is
at fault there\.

Regards,
Patrik Lundin

Hello,

I am curious if anyone is running a successful dnstap logging setup for
unbound and what tools you are using for solving this.

Currently I am playing around with this on a debian buster machine with
backports enabled so I can install unbound 1.12. My goal is to have binary
dnstap files being created on disk, with logfile rotation over some interval so
you can clean up older files over time.

For my initial attempt I tried using fstrm_capture since it is already
available from the "fstrm-bin" package in buster and also supports logfile
rotation. I have noticed however that fstrm_capture will often crash when
stopping unbound.

Based on the man page example I start the process like so:

sudo -u unbound /usr/bin/fstrm_capture -ddddd -t protobuf:dnstap.Dnstap \
                 -u /var/lib/unbound/dnstap.sock \
                 -w /var/log/dnstap/dnstap-%F-%T.fstrm \
                 -s 3600 --gmtime

Hi, Patrik:

I have a very similar setup where I run fstrm_capture on a Debian buster
system with unbound 1.12.0 from backports and fstrm-bin/libfstrm0 0.6.0
from unstable. I run fstrm_capture as a service:

    ExecStart=/usr/bin/fstrm_capture --type protobuf:dnstap.Dnstap --unix /var/log/dnstap/.sock --localtime -w /var/log/dnstap/dnstap.%%Y%%m%%d.%%H%%M.%%s

with a monthly cron that rotates the output file:

    @monthly root systemctl kill --kill-who=main --signal=SIGUSR1 fstrm_capture 2>&1

I have not seen fstrm_capture crash with SIGPIPE. However, on another
machine running unbound 1.13.0 I have noticed fstrm_capture logs the
following message when unbound is stopped, but fstrm_capture keeps
running:

    fstrm_capture: connection fd 8: libevent error: Broken pipe (32)

That makes sense to me, because if fstrm_capture is handling and logging
an EPIPE from libevent then the process isn't being killed with SIGPIPE.
But I'm not sure why the behavior differs between your fstrm_capture
instance and mine.

I was confused by the exiting beaviour at first, it looks like the process just
disappears for no obvious reason but after attaching gdb I saw the following:

fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"

Program received signal SIGPIPE, Broken pipe.
0x00007ffff7c74504 in __GI___writev (fd=8, iov=0x7fffffffe010, iovcnt=1) at ../sysdeps/unix/sysv/linux/writev.c:26
26      ../sysdeps/unix/sysv/linux/writev.c: Permission denied.
```

Technically, since unbound is shutting down, it's not a big deal that it
didn't wait to receive the FSTRM_CONTROL_FINISH frame from
fstrm_capture. You might try compiling fstrm 0.6.0 from source from
GitHub - farsightsec/fstrm: Frame Streams implementation in C · GitHub with the attached patch to ignore
SIGPIPE and see if that works for you. If it does I'll submit a pull
request upstream for fstrm.

(attachments)

fstrm_capture-ignore-SIGPIPE.patch (363 Bytes)

Robert Edmonds wrote:

I have not seen fstrm_capture crash with SIGPIPE. However, on another
machine running unbound 1.13.0 I have noticed fstrm_capture logs the
following message when unbound is stopped, but fstrm_capture keeps
running:

    fstrm_capture: connection fd 8: libevent error: Broken pipe (32)

That makes sense to me, because if fstrm_capture is handling and logging
an EPIPE from libevent then the process isn't being killed with SIGPIPE.
But I'm not sure why the behavior differs between your fstrm_capture
instance and mine.

Looking into this a bit more, it's because I'm running fstrm_capture
under systemd, which by default causes SIGPIPE to be ignored:

    IgnoreSIGPIPE=
        Takes a boolean argument. If true, causes SIGPIPE to be ignored
        in the executed process. Defaults to true because SIGPIPE
        generally is useful only in shell pipelines.

Nice catch! I was not aware systemd makes executed processes ignore SIGPIPE by
default. Indeed, I added a systemd service file for fstrm_capture (feedback
appreciated, of course the -d flags would be removed for normal operation)

[Unit]
Description=fstrm_capture service
Wants=unbound.service
Before=unbound.service
RequiresMountsFor=/var/lib/unbound

[Service]
Type=simple
User=unbound
Group=unbound
ExecStart=/usr/bin/fstrm_capture -ddddd -t protobuf:dnstap.Dnstap -u /var/lib/unbound/dnstap.sock -w /var/log/dnstap/dnstap-%%F-%%T.fstrm -s 3600 --gmtime

[Install]
WantedBy=multi-user.target

And now I also see a message that the SIGPIPE is ignored when stopping unbound:

Dec 28 12:16:31 linux-amd64-t01 fstrm_capture[4227]: fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
Dec 28 12:16:31 linux-amd64-t01 fstrm_capture[4227]: fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
Dec 28 12:16:31 linux-amd64-t01 fstrm_capture[4227]: fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
Dec 28 12:16:31 linux-amd64-t01 fstrm_capture[4227]: fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"
Dec 28 12:16:31 linux-amd64-t01 fstrm_capture[4227]: fstrm_capture: connection fd 8: libevent error: Broken pipe (32)
Dec 28 12:16:31 linux-amd64-t01 fstrm_capture[4227]: fstrm_capture: connection fd 8: closing (read 0 frames, 0 bytes)

... with the process still running, so this is a functional workaround for the time being:

unbound   4227  0.0  0.0   4900   888 ?        Ss   12:15   0:00 /usr/bin/fstrm_capture -ddddd -t protobuf:dnstap.Dnstap -u /var/lib/unbound/dnstap.sock -w /var/log/dnstap/dnstap-%F-%T.fstrm -s 3600 --gmtime

I see you have already gone ahead and opened a pull request:
https://github.com/farsightsec/fstrm/pull/65

Thanks for the help :slight_smile:

Regards,
Patrik Lundin