Nsd restart failed?

Hi,

I had a failed nsd server that was not running. The logs showed after I
issued a "service nsd restart"

Jan 27 18:04:18 ns1 nsd[29792]: ...stale pid file from process 29516
Jan 27 18:04:18 ns1 nsd[29793]: fallback to UDP4, no IPv6: not supported
Jan 27 18:04:18 ns1 nsd[29793]: fallback to TCP4, no IPv6: not supported
Jan 27 18:04:18 ns1 nsd[29793]: xfr: zone XXXXXX.org. not in config.
Jan 27 18:04:18 ns1 nsd[29793]: no zone exists
Jan 27 18:04:18 ns1 nsd[29793]: bad ixfr packet part 0 in /var/lib/nsd/ixfr.db
Jan 27 18:04:18 ns1 nsd[29793]: marked xfr as failed: xfrd: zone XXXXXX.org received update to serial 2011121916 at time 1324285447 from 193.110.157.135 in 1 parts
Jan 27 18:04:18 ns1 nsd[29793]: marked xfr so that next reload can succeed

Running nsdc rebuild did not help. I had to rm /var/lib/nsd/ixfr.db
before nsd would run again.

Note that the server involved (193.110.157.135) had ceased to be,
so any XFR was impossible.

Looking back further in the logs, I think this is what happened.

IXFR/AXFR's were failing due to the primary no longer allowing or
running. Some state of these failures for those domains were stored in
the ixfr.db. The operator (me) logged in, and edited the nsd config to
remove those dead zones. Furthermore, the operator (me) ran "service nsd
restart" without further looking, and logged out. nsd failed to start
due to various:

Jan 15 18:06:50 ns1 nsd[14746]: marked xfr as failed: xfrd: zone
xxxxxxxxx.org received update to serial 2011121916 at time 1324285408
from 193.110.157.135 in 2 parts

Paul

Hi Paul,

Do you still have the ixfr.db and required files to reproduce this? If
so, is it possible to mail them, so we can investigate it further. If
not, can you save the files the next time this happens?

Thanks,

Matthijs

Do you still have the ixfr.db and required files to reproduce this? If
so, is it possible to mail them, so we can investigate it further. If
not, can you save the files the next time this happens?

I don't. I will keep them next time.

Paul

Hello,

after editing config file nsd.conf and restart nsd in log i see

[1333972197] nsd[32738]: error: xfr: zone domtest1.ru. not in config.
[1333972197] nsd[32738]: error: no zone exists
[1333972197] nsd[32738]: error: bad ixfr packet part 0 in
/var/lib/nsd/ixfr.db
[1333972197] nsd[32738]: error: marked xfr as failed: xfrd: zone
domtest1.ru received update to serial 2012040900 at time 1333972146 from
46.30.40.23 in 1 parts
[1333972197] nsd[32738]: error: marked xfr so that next reload can succeed

after removing /var/lib/nsd/ixfr.db and "service nsd restart" daemon
start working.

faulty ixfr.db i added to attr.

Please help with this issue?

(attachments)

ixfr.db (3.64 KB)

Hi,

Have you removed the "domtest1.ru" zone from the config? If you did,
you need to run "nsdc rebuild" first, before you restart the server.

Best regards,
  Matthijs

Hi,

I have committed a fix for this in the NSD_3_2 branch, rev 3594.
Instead of exiting, the ixfr parts for this zone are ignored.

Best regards,
  Matthijs