NSD 4.5.0rc1 pre-release

Hi,

NSD 4.5.0rc1 pre-release is available
https://nlnetlabs.nl/downloads/nsd/nsd-4.5.0rc1.tar.gz
sha256 2143268818f0f840f9fbb99a9350eaa553ee9d0b3b325851dd14a7b815b0a6e7
pgp https://nlnetlabs.nl/downloads/nsd/nsd-4.5.0rc1.tar.gz.gpg

This release fixes a couple of minor bugs and adds IXFR out
functionality. With this functionality NSD can respond to IXFR queries
and serve IXFR transfers downstream.

It is default disabled, that means it does not store IXFR contents for
zones by default. The response on the wire is different, also with IXFR
disabled, because it is now supported, and thus also for those zones a
reply is served, that no differential data is available.

4.5.0

Hi,

NSD 4.5.0rc1 pre-release is available
https://nlnetlabs.nl/downloads/nsd/nsd-4.5.0rc1.tar.gz

Hello Wouter,

this version compile without problems here and works at my small lab environment.

sha256 2143268818f0f840f9fbb99a9350eaa553ee9d0b3b325851dd14a7b815b0a6e7
pgp https://nlnetlabs.nl/downloads/nsd/nsd-4.5.0rc1.tar.gz.gpg

should end with .asc, not .gpg

This release fixes a couple of minor bugs and adds IXFR out
functionality. With this functionality NSD can respond to IXFR queries
and serve IXFR transfers downstream.

It is default disabled,

make sense ...

But I would like the moment to point to segfaults I'm seeing more for years
([nsd-users] NSD 4.2.0 intermittent segfaults @ libssl ?)
but not further discussed on this list yet.

Today I found a new data point. I use to build NSD with libev. I see failures.
I would say, they are related to process termination.
Today I rebuild NSD with libevent and the failure goes away!

To reproduce that I wrote two Dockerfile (attached)

$ docker build -t nsd:libevent -f Dockerfile.libevent .
...
Successfully tagged nsd:libevent
$ docker run --rm -ti nsd:libevent
[2022-05-06 21:18:42.056] nsd[1]: notice: nsd starting (NSD 4.5.0)
[2022-05-06 21:18:42.096] nsd[7]: notice: nsd started (NSD 4.5.0), pid 1
<PRESS CTRL+C>
[2022-05-06 21:18:43.979] nsd[7]: warning: signal received, shutting down...

-> this is ok

now let's build using libev
$ docker build -t nsd:libev -f Dockerfile.libev .
...
Successfully tagged nsd:libev
$ docker run --rm -ti nsd:libev
[2022-05-06 21:21:14.724] nsd[1]: notice: nsd starting (NSD 4.5.0)
[2022-05-06 21:21:14.743] nsd[7]: notice: nsd started (NSD 4.5.0), pid 1
<PRESS CTRL+C>
[2022-05-06 21:21:32.879] nsd[7]: warning: server 8 died unexpectedly, restarting
[2022-05-06 21:21:32.879] nsd[7]: warning: signal received, shutting down...
[2022-05-06 21:21:32.884] nsd[9]: error: mode bad value 2, back to service.

-> I think, this should not happen...

I mentioned process termination. I see similar segfaults if NSD as slave receive new data via axfr.
One server's log fill up with segfaul messages, but the data received via axfr are valid!
The segfault seem to happen very close to a NSD process end.

Andreas

(attachments)

Dockerfile.libevent (856 Bytes)
Dockerfile.libev (850 Bytes)

Hi Andreas,

Thank you for trying the new version.

About the libev troubles.

I managed to run the regression test suite for NSD on libev. Two
different versions of it, but it does not produce that failure.
Everything works, apart from a signal handling test that fails, but
downloading an axfr does not involve signal handling, so that is not the
issue here.

Not sure how to proceed, since I cannot reproduce. I wonder what the
cause is.

Best regards, Wouter

Hi,

NSD 4.5.0 is available
https://nlnetlabs.nl/downloads/nsd/nsd-4.5.0.tar.gz
sha256 5ae7a704ab92c8a49f3c8f3a29565ce194c51a721c29c75ea7d43c13372d79c5
pgp https://nlnetlabs.nl/downloads/nsd/nsd-4.5.0.tar.gz.asc

This release fixes a couple of minor bugs and adds IXFR out
functionality. With this functionality NSD can respond to IXFR queries
and serve IXFR transfers downstream.

It is default disabled, that means it does not store IXFR contents for
zones by default. The response on the wire is different, also with IXFR
disabled, because it is now supported, and thus also for those zones a
reply is served, that no differential data is available.

4.5.0

Wouter Wijngaards:

I managed to run the regression test suite for NSD on libev. Two
different versions of it, but it does not produce that failure.
Everything works, apart from a signal handling test that fails, but
downloading an axfr does not involve signal handling, so that is not the
issue here.

Not sure how to proceed, since I cannot reproduce. I wonder what the
cause is.

Hello Wouter,

thanks for having a look into my trouble. I wonder about your "could not reproduce"
Did you build both containers with the Dockerfiles I send?

I've two servers receiving AXFF from a BIND master (no version known)
where I see the segfault messages constantly after AXFR.
I also could not reproduce any segfault on AXFR in a lab. I tried NSD-provider -> AXFR -> NSD-consumer
and used even the same zone. That makes the segfaults something strage to me, too.
They really did only happen on my prod servers.
Using the Dockerfiles was the first time I could trigger a similar error
but yes, the setup is not the same.

I now switched on server to a version built with libevent instead of libev
and the segfaults no longer happen!

In both cases the error happen very late when a process terminate.
So maybe it's related.

I plan to replace libev by libevent on /all/ of my numerous NSD instances.
That would solve this issues for me.

Could you describe the differences between libevent and libev?
What was the reason you implemented NSD in a way to use one OR the other library?
What may I expect when switching from libev back to libevent?
More load? Latency? Memory usage? Watever?

Andreas

Hi Andreas,

Hello Wouter,

thanks for having a look into my trouble. I wonder about your "could not
reproduce"
Did you build both containers with the Dockerfiles I send?

The build process that I used looks like you put these lines in the
Docker scripts, after the WORKDIR line,
ADD http://dist.schmorp.de/libev/libev-4.33.tar.gz
RUN tar xzf libev-4.33.tar.gz
RUN cd /tmp/libev-4.33
RUN ./configure --prefix=/tmp/libev-4.33/install
RUN make
RUN make install
RUN cd ..
... and then for the ./configure line for nsd
RUN ./configure --with-libevent=/tmp/libev-4.33/install

It is different in that it does not use the compatibility header files
and mode, but uses libev straight up, and detects that. I think NSD then
continues to use some libev calls and some libev-libevent-compatibility
calls.

I've two servers receiving AXFF from a BIND master (no version known)
where I see the segfault messages constantly after AXFR.
I also could not reproduce any segfault on AXFR in a lab. I tried
NSD-provider -> AXFR -> NSD-consumer
and used even the same zone. That makes the segfaults something strage
to me, too.
They really did only happen on my prod servers.
Using the Dockerfiles was the first time I could trigger a similar error
but yes, the setup is not the same.

I now switched on server to a version built with libevent instead of libev
and the segfaults no longer happen!

In both cases the error happen very late when a process terminate.
So maybe it's related.

I plan to replace libev by libevent on /all/ of my numerous NSD instances.
That would solve this issues for me.

That is good to hear, the plan is to have the issues resolved.

Could you describe the differences between libevent and libev?

libev does a socket event notification API, and libevent has that too,
but libevent also provides other building blocks, like buffers, and
https. NSD uses the event notification API. libev provides an API that
is similar to libevent's event notification API.

What was the reason you implemented NSD in a way to use one OR the other
library?

No real reason. Could be, small, speed differences, but mainly they use
the same system back end, so not really that important. Different code
allows for change when there are bugs. Like we have here.

What may I expect when switching from libev back to libevent?
More load? Latency? Memory usage? Watever?

Nothing really. The libevent build is the default build option for NSD.

For completeness, it is also possible to build NSD without libevent, and
then it uses a builtin event loop. This one is limited to 1024 sockets.

Best regards, Wouter