Jun 11 08:53:24 ns03 nsd[12296]: axfr for example1.com. from 109.74.194.10
Jun 11 08:53:24 ns03 kernel: [35762.840704] nsd[12296]: segfault at 560244acb618 ip 00007fefedb81406 sp 00007ffe6c552ac0 error 7 in libssl.so.1.1[7fefedb43000+86000]
Jun 11 08:53:24 ns03 nsd[12246]: server 12296 died unexpectedly, restarting
Jun 11 08:53:24 ns03 nsd[12225]: [2019-06-11 08:53:24.960] nsd[12246]: warning: server 12296 died unexpectedly, restarting
Jun 11 08:53:25 ns03 nsd[12246]: process 12296 terminated with status 139
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.030] nsd[12246]: warning: process 12296 terminated with status 139
Jun 11 08:53:25 ns03 nsd[12297]: axfr for example2.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.557] nsd[12297]: info: axfr for example2.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.557] nsd[12297]: info: axfr for example3.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12297]: axfr for example3.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12297]: axfr for example4.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.565] nsd[12297]: info: axfr for example4.com. from 207.192.70.10
Jun 11 08:53:25 ns03 kernel: [35763.583172] nsd[12297]: segfault at 560244acb618 ip 00007fefedb81406 sp 00007ffe6c552ac0 error 7 in libssl.so.1.1[7fefedb43000+86000]
Note, *NOT* on every axfr; some seem to work.
Just starting to troubleshoot here ...
Any obvious issues that are already known that might cause this?
Jun 11 08:53:24 ns03 nsd[12296]: axfr for example1.com. from 109.74.194.10
Jun 11 08:53:24 ns03 kernel: [35762.840704] nsd[12296]: segfault at 560244acb618 ip 00007fefedb81406 sp 00007ffe6c552ac0 error 7 in libssl.so.1.1[7fefedb43000+86000]
Jun 11 08:53:24 ns03 nsd[12246]: server 12296 died unexpectedly, restarting
Jun 11 08:53:24 ns03 nsd[12225]: [2019-06-11 08:53:24.960] nsd[12246]: warning: server 12296 died unexpectedly, restarting
Jun 11 08:53:25 ns03 nsd[12246]: process 12296 terminated with status 139
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.030] nsd[12246]: warning: process 12296 terminated with status 139
Jun 11 08:53:25 ns03 nsd[12297]: axfr for example2.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.557] nsd[12297]: info: axfr for example2.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.557] nsd[12297]: info: axfr for example3.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12297]: axfr for example3.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12297]: axfr for example4.com. from 207.192.70.10
Jun 11 08:53:25 ns03 nsd[12225]: [2019-06-11 08:53:25.565] nsd[12297]: info: axfr for example4.com. from 207.192.70.10
Jun 11 08:53:25 ns03 kernel: [35763.583172] nsd[12297]: segfault at 560244acb618 ip 00007fefedb81406 sp 00007ffe6c552ac0 error 7 in libssl.so.1.1[7fefedb43000+86000]
Note, *NOT* on every axfr; some seem to work.
Just starting to troubleshoot here ...
Any obvious issues that are already known that might cause this?
Hello "PGNet Dev"
now, as you ask:
I see similar messages before I updated from 4.1.27 to 4.2.0
And now, as you mentioned that issue, I also found the segfault message
From what I see something bad must happen *after* AXFR is completed.
- I see no warning/error on the consumer side
- I don't use TLS for AXFR
- it happened also on 4.1.27
- I can't reproduce in an lab environment
- none of my users asked me that they miss something
BTW: There is a draft ¹) "Message Digest for DNS Zones" to prove a transferred zone was received complete
I've added ²) the ldns-zone-digest tool in my ldns instance and can create and verify zone files.
Unfortunately not in this particilar installation :-/
Would be helpful if nsd could check such ZONEMD if available
if that matters: I *do* use TSIG-Keys to secure the AXFR
Here, all the slave AXFRs are configured NOKEY.
So it is not SSL. Then it could be TCP Fast Open (server-side fast
open, from the logs I see this is the AXFR server). Or perhaps the file
descriptor events could be mangled, in the event handling. This patch
may help? Otherwise, perhaps a stack trace of the issue could be useful
to have, or a way to reproduce it.
Jun 12 17:48:57 dnsserver02 kernel: [2448848.071188] show_signal_msg: 7871 callbacks suppressed
Jun 12 17:48:57 dnsserver02 kernel: [2448848.071191] nsd[59588]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.071684] nsd[59589]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.072179] nsd[59590]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.072672] nsd[59591]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.073452] nsd[59592]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.073972] nsd[59593]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.075735] nsd[59594]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.076253] nsd[59595]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.077077] nsd[59596]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
Jun 12 17:48:57 dnsserver02 kernel: [2448848.077576] nsd[59597]: segfault at 4 ip 00007f09e0d92ba3 sp 00007ffe460a0978 error 6 in libev.so.4.0.0[7f09e0d89000+d000]
# gdb /usr/sbin/nsd
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>\.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>\.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/nsd...(no debugging symbols found)...done.
(gdb) attach 32456
Attaching to program: /usr/sbin/nsd, process 32456
Reading symbols from /lib/x86_64-linux-gnu/libssl.so.1.1...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libprotobuf-c.so.1...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libfstrm.so.0...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libcrypto.so.1.1...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libev.so.4...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_compat.so.2...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnsl.so.1...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_nis.so.2...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...(no debugging symbols found)...done.
0x00007ffbed5602e3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) continue
Continuing.
{{{{{ now I start "nsd_control force_transfer" @ the SLAVE-Server ... }}}}
Program received signal SIGSEGV, Segmentation fault.
0x00007ffbee331785 in SSL_shutdown () from /lib/x86_64-linux-gnu/libssl.so.1.1
(gdb) bt #0 0x00007ffbee331785 in SSL_shutdown () from /lib/x86_64-linux-gnu/libssl.so.1.1 #1 0x000055d50cb2ce0e in ?? () #2 0x000055d50cb2dfd8 in ?? () #3 0x00007ffbed81a413 in ev_invoke_pending () from /lib/x86_64-linux-gnu/libev.so.4 #4 0x00007ffbed81da06 in ev_run () from /lib/x86_64-linux-gnu/libev.so.4 #5 0x00007ffbed81ffd9 in event_base_loop () from /lib/x86_64-linux-gnu/libev.so.4 #6 0x000055d50cb30bdd in ?? () #7 0x000055d50cb30fcb in ?? () #8 0x000055d50cb3244c in ?? () #9 0x000055d50cae3ff7 in ?? () #10 0x00007ffbed4972e1 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #11 0x000055d50cae45fa in ?? ()
(gdb) quit
A debugging session is active.
Inferior 1 [process 32456] will be detached.
Quit anyway? (y or n) y
Detaching from program: /usr/sbin/nsd, process 32456
(gdb) bt #0 0x00007ffbee334a8a in SSL_free () from /lib/x86_64-linux-gnu/libssl.so.1.1 #1 0x000055d50cb2ce1a in cleanup_tcp_handler (data=0x55d50ec17d80) at server.c:3001 #2 0x000055d50cb2dfd8 in handle_tcp_reading (fd=<optimized out>, event=<optimized out>, arg=0x55d50ec17d80) at server.c:3188 #3 0x00007ffbed81a413 in ev_invoke_pending () from /lib/x86_64-linux-gnu/libev.so.4 #4 0x00007ffbed81da06 in ev_run () from /lib/x86_64-linux-gnu/libev.so.4 #5 0x00007ffbed81ffd9 in event_base_loop () from /lib/x86_64-linux-gnu/libev.so.4 #6 0x000055d50cb30bdd in server_child (nsd=nsd@entry=0x55d50cdbffa0 <nsd>) at server.c:2652 #7 0x000055d50cb30fcb in restart_child_servers (nsd=nsd@entry=0x55d50cdbffa0 <nsd>, region=region@entry=0x55d50ebf9490, netio=netio@entry=0x55d50ec0f7c0, xfrd_sock_p=0x55d50ebca532) at server.c:427 #8 0x000055d50cb3244c in server_main (nsd=0x55d50cdbffa0 <nsd>) at server.c:2200 #9 0x000055d50cae3ff7 in main (argc=<optimized out>, argv=<optimized out>) at nsd.c:1147
(gdb) bt #0 0x00007ffbee334a8a in SSL_free () from /lib/x86_64-linux-gnu/libssl.so.1.1 #1 0x000055d50cb2ce1a in cleanup_tcp_handler (data=0x55d50ec17d80) at server.c:3001 #2 0x000055d50cb2dfd8 in handle_tcp_reading (fd=<optimized out>, event=<optimized out>, arg=0x55d50ec17d80) at server.c:3188 #3 0x00007ffbed81a413 in ev_invoke_pending () from /lib/x86_64-linux-gnu/libev.so.4 #4 0x00007ffbed81da06 in ev_run () from /lib/x86_64-linux-gnu/libev.so.4 #5 0x00007ffbed81ffd9 in event_base_loop () from /lib/x86_64-linux-gnu/libev.so.4 #6 0x000055d50cb30bdd in server_child (nsd=nsd@entry=0x55d50cdbffa0 <nsd>) at server.c:2652 #7 0x000055d50cb30fcb in restart_child_servers (nsd=nsd@entry=0x55d50cdbffa0 <nsd>, region=region@entry=0x55d50ebf9490, netio=netio@entry=0x55d50ec0f7c0, xfrd_sock_p=0x55d50ebca532) at server.c:427 #8 0x000055d50cb3244c in server_main (nsd=0x55d50cdbffa0 <nsd>) at server.c:2200 #9 0x000055d50cae3ff7 in main (argc=<optimized out>, argv=<optimized out>) at nsd.c:1147
(gdb) bt #0 0x00007ffbee334a8a in SSL_free () from
/lib/x86_64-linux-gnu/libssl.so.1.1 #1 0x000055d50cb2ce1a in cleanup_tcp_handler (data=0x55d50ec17d80) at
server.c:3001 #2 0x000055d50cb2dfd8 in handle_tcp_reading (fd=<optimized out>,
event=<optimized out>, arg=0x55d50ec17d80) at server.c:3188 #3 0x00007ffbed81a413 in ev_invoke_pending () from
/lib/x86_64-linux-gnu/libev.so.4 #4 0x00007ffbed81da06 in ev_run () from /lib/x86_64-linux-gnu/libev.so.4 #5 0x00007ffbed81ffd9 in event_base_loop () from
/lib/x86_64-linux-gnu/libev.so.4 #6 0x000055d50cb30bdd in server_child (nsd=nsd@entry=0x55d50cdbffa0
<nsd>) at server.c:2652 #7 0x000055d50cb30fcb in restart_child_servers
(nsd=nsd@entry=0x55d50cdbffa0 <nsd>,
region=region@entry=0x55d50ebf9490, netio=netio@entry=0x55d50ec0f7c0,
xfrd_sock_p=0x55d50ebca532) at server.c:427 #8 0x000055d50cb3244c in server_main (nsd=0x55d50cdbffa0 <nsd>) at
server.c:2200 #9 0x000055d50cae3ff7 in main (argc=<optimized out>, argv=<optimized
>) at nsd.c:1147