Slow AXFR propagation to nsd server

Hello,
I have a nsd version 4.3.9 (from official Ubuntu Jammy repository) configured as a slave server with about 400k zones.

I have an issue with a delay of AXFR/IXFR requests, which sometimes takes more than 10 seconds. Example of receiving XFR:

2022-08-16_14:29:01 xxxxxxxx nsd[1270460]: info: notify for somedomain. from 192.168.205.10 serial 1658932140
2022-08-16_14:29:07 xxxxxxxx nsd[2867429]: info: xfrd: zone somedomain committed “received update to serial 1658932140 at 2022-08-16T14:29:07 from 192.168.205.10 TSIG verified with key xxxxxxxxx”
2022-08-16_14:29:18 dfo5pub1 nsd[2867432]: info: zone somedomain. received update to serial 1658932140 at 2022-07-27T14:29:07 from 192.168.205.10 TSIG verified with key xxxxxxxxx of 3045 bytes in 4.1e-05 seconds
2022-08-16_14:29:28 dfo5pub1 nsd[2867429]: info: zone somedomain serial 1658825141 is updated to 1658932140

You can see, in this example, there is 10s delay between received update and zone is updated actions.

Nsd configuration, server section (We use bare metal server with 48 threads (24 cores + hyperthreading)):

server:
server-count: 40

Anycast addresses on loopback interface

ip-transparent: yes
ip-address: enp65s0f0
ip-address: lo
verbosity: 9
database: “/var/lib/nsd/nsd.db”
reuseport: yes
zonesdir: “/var/lib/nsd”
hide-version: yes
version: “NSD”
identity: “unidentified server”
refuse-any: yes

Response Rate Limiting

rrl-size: 50000000
rrl-ratelimit: 300
rrl-slip: 10

TCP capacity (https://nsd.docs.nlnetlabs.nl/en/latest/running/tuning.html?highlight=performance)

tcp-count: 1400
tcp-timeout: 6
tcp-reject-overflow: yes

I tried to remove the database with database: “”, but there were no significant change. I tried to setup the cpu affinity as well, but without success, but I’d like to avoid of this complexity.

Do we have something wrong in our setup or does we reach the limitation of the daemon. The server cpu graph shows us about 10% system time, which seems weird to me as well and about 1% of user time., the bandwidth is less than 5Mbps.

Can you give me some advice how to speed the process up?

Thank you in advance.

Zdenek Novy
Active24

Hi Zdenek,

2022-08-16_14:29:01 xxxxxxxx nsd[1270460]: info: notify for somedomain. from 192.168.205.10 serial 1658932140
2022-08-16_14:29:07 xxxxxxxx nsd[2867429]: info: xfrd: zone somedomain committed "received update to serial 1658932140 at 2022-08-16T14:29:07 from 192.168.205.10 TSIG verified with key xxxxxxxxx"

This is the point at which the XFR has been written to disk, and NSD has initiated a reload.

2022-08-16_14:29:18 dfo5pub1 nsd[2867432]: info: zone somedomain. received update to serial 1658932140 at 2022-07-27T14:29:07 from 192.168.205.10 TSIG verified with key xxxxxxxxx of 3045 bytes in 4.1e-05 seconds

It has taken NSD 11 seconds to update its internal memory structures with the new XFR.

2022-08-16_14:29:28 dfo5pub1 nsd[2867429]: info: zone somedomain serial 1658825141 is updated to 1658932140

10 more seconds later, the newly forked child processes have noticed the new zone data and are serving it.

You can see, in this example, there is 10s delay between received update and zone is updated actions.

Indeed. This is how NSD works. Since you have a lot of zones, it takes NSD quite some time to update its internal memory structures.

I tried to remove the database with database: "", but there were no significant change. I tried to setup the cpu affinity as well, but without success, but I'd like to avoid of this complexity.

None of these things will do anything to speed up NSD's update of its memory structures.

Do we have something wrong in our setup or does we reach the limitation of the daemon. The server cpu graph shows us about 10% system time, which seems weird to me as well and about 1% of user time., the bandwidth is less than 5Mbps.

Can you give me some advice how to speed the process up?

I don't think you can make it any faster.

Regards,
Anand