For now one week, one machine has NSD crashing after a few hours of
running, corrupting nsd.db.
The log (verbosity 4) says:
Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9
Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: process 1975 exited with status 9
Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not empty
Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.909] nsd[1974]: warning: rmdir /tmp/nsd-xfr-1974 failed: Directory not empty
Jan 06 20:31:31 ada nsd[2195]: nsd starting (NSD 4.1.26)
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.418] nsd[2195]: notice: nsd starting (NSD 4.1.26)
Jan 06 20:31:31 ada nsd[2195]: setup SSL certificates
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.421] nsd[2195]: info: setup SSL certificates
Jan 06 20:31:31 ada nsd[2196]: /var/lib/nsd/nsd.db: not cleanly closed 0
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: /var/lib/nsd/nsd.db: not cleanly closed 0
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: can not use /var/lib/nsd/nsd.db, will create anew
And then NSD stops. I have to start it manually, making it work for a
few more hours.
This machine worked fine, with the same set of zones, for several
years (yes, of course, software was upgraded, but another Debian,
machine, same version and same NSD, and almost same set of zones, has
no problem).
Debian "stable" 10.2, Linux kernel 4.19.0, NSD 4.1.26. As I said, a
very similar machine works fine.
% ls -alt /var/lib/nsd
total 552
-rw------- 1 nsd nsd 589824 Jan 6 20:33 nsd.db
-rw-r--r-- 1 nsd nsd 6605 Jan 6 20:31 xfrd.state
drwxr-xr-x 2 nsd nsd 4096 Jan 6 20:31 .
drwxr-xr-x 70 root root 4096 Jan 6 20:18 ..
Deleting all /var/lib/nsd and starting from a fresh directory changes
nothing.
For now one week, one machine has NSD crashing after a few hours of
running, corrupting nsd.db.
The log (verbosity 4) says:
Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9
Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: process 1975 exited with status 9
Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not empty
This suggests that an incoming XFR is triggering a bug. Have you saved
the contents of the nsd-xfr-1974 directory? If not, perhaps you can save
it the next time it happens. This may help the developers in figuring
out what causes the crash.
Also, is there any log above this, to indicate which zone it might be?
Note that there are several newer versions of NSD since 4.1.26, so this
bug may also have been fixed in a newer version. If you can upgrade, you
may want to do that.
Finally, the database mode is no longer recommended. Could you try
running your instance of NSD with:
This suggests that an incoming XFR is triggering a bug. Have you saved
the contents of the nsd-xfr-1974 directory? If not, perhaps you can save
it the next time it happens. This may help the developers in figuring
out what causes the crash.
Apparently, it was a lack of memory:
[8374219.385014] Out of memory: Kill process 10677 (nsd) score 66 or sacrifice child
[8374219.385758] Killed process 10678 (nsd) total-vm:37552kB, anon-rss:676kB, file-rss:0kB, shmem-rss:27344kB
[8374219.386779] oom_reaper: reaped process 10678 (nsd), now anon-rss:0kB, file-rss:0kB, shmem-rss:27344kB
Finally, the database mode is no longer recommended. Could you try
running your instance of NSD with:
database: ""
Currently under test and no problem yet (anyway, I'll add RAM).