Nsd: Could not tcp connect to a🅰a🅰:1:1: Operation timed out

Hi --

I do run both my primary and secondary nameservers in FreeBSD jails as outlined below:

(jail1/a:a:a:a::1:1) <-----> (jail2/b:b:b:b::1:1)
(jail1/10.10.10.1) <--NAT--> (host 1.2.3.4) <-----> (host 5.6.7.8) <--NAT--> (jail2/10.10.10.1)

jail1 (master): nsd.conf (relevant part)

        ip-address: 10.10.10.1
        ip-address: a:a:a:a::1:1
  [...]
        notify: b:b:b:b::1:1 secret-key
        provide-xfr: b:b:b:b::1:1 secret-key
        outgoing-interface: a:a:a:a::1:1
  [...]

jail2 (slave): nsd.conf (relevant part)

        ip-address: 10.10.10.1
        ip-address: b:b:b:b::1:1
  [...]
        allow-notify: a:a:a:a::1:1 secret-key
        request-xfr: AXFR a:a:a:a::1:1 secret-key
        outgoing-interface: b:b:b:b::1:1
  [...]

Both servers are running well, serving all requests as expected, and the master is delivering all zones with afxr at startup perfectly well. But, I get the following error messages (for IPv6 address, only!) in the *slave*'s syslog:

nsd: Could not tcp connect to a:a:a:a::1:1: Operation timed out

tcpdump at the *master* tells me (shortend to the relevant part):

pass in on em0: (flowlabel 0x360ed, hlim 63, next-header TCP (6) payload length: 40) b:b:b:b::1:1.15298 > a:a:a:a::1:1.53: Flags [S], cksum 0xfedd (incorrect -> 0x7df6), seq 1459122906, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 333780857 ecr 0], length 0

So, what's going wrong here:
- Is it my setup regarding nsd?
- Is it a screwed IPv6 routing?
- Or something else?

Any help is highly appreciated.

Thanks and with kind regards,
Michael

Hi Michael,

Your configuration looks good.

Op 26-12-12 16:23, Michael Grimm schreef:

Both servers are running well, serving all requests as expected, and the master is delivering all zones with afxr at startup perfectly well. But, I get the following error messages (for IPv6 address, only!) in the *slave*'s syslog:
> nsd: Could not tcp connect to a:a:a:a::1:1: Operation timed out

tcpdump at the *master* tells me (shortend to the relevant part):
> pass in on em0: (flowlabel 0x360ed, hlim 63, next-header TCP (6) payload length: 40) b:b:b:b::1:1.15298 > a:a:a:a::1:1.53: Flags [S], cksum 0xfedd (incorrect -> 0x7df6), seq 1459122906, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 333780857 ecr 0], length 0

I suspect the master and the slave are on the same host causing the
checksum error (because checksum calculation is offloaded to the NIC).
Is this true?

Is this the only packet you saw for the handshake? Did you see an
Syn/Ack after this packet? I.e. A packet returning from a:a:a:a::1:1
with "Flags [S.]"?

Do you see the tcp6 listening socket on the master with "sockstat -l"?

Do you use carp devices for the jails. I have seen some weird ipv6
routing behaviour with those myself.

Regards,

-- Willem

Hi Willem --

Op 26-12-12 16:23, Michael Grimm schreef:

Both servers are running well, serving all requests as expected, and the master is delivering all zones with afxr at startup perfectly well. But, I get the following error messages (for IPv6 address, only!) in the *slave*'s syslog:
> nsd: Could not tcp connect to a:a:a:a::1:1: Operation timed out

tcpdump at the *master* tells me (shortend to the relevant part):
> pass in on em0: (flowlabel 0x360ed, hlim 63, next-header TCP (6) payload length: 40) b:b:b:b::1:1.15298 > a:a:a:a::1:1.53: Flags [S], cksum 0xfedd (incorrect -> 0x7df6), seq 1459122906, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 333780857 ecr 0], length 0

I suspect the master and the slave are on the same host causing the
checksum error (because checksum calculation is offloaded to the NIC).
Is this true?

No, both jails are located at distinct servers (within the same datacenter, though).

Is this the only packet you saw for the handshake? Did you see an
Syn/Ack after this packet? I.e. A packet returning from a:a:a:a::1:1
with "Flags [S.]"?

No, but I cannot see those flags for any tcpdump logfile entry over the last 30 minutes, capturing traffic by:

tcpdump -n -e -ttt -s 256 -v -i pflog0 (pf firewall)
tcpdump -n -e -ttt -s 256 -v -i em0 (outside interface)

(I do have to admit that I'm not that much an expert to tcpdump.)

BTW, my firewall rules regarding nameserver traffic are as follows:

pass in log on em0 inet6 proto tcp from any to a:a:a:a::1:1 port = domain flags S/SA keep state tag ip6domain
pass in log on em0 inet6 proto udp from any to a:a:a:a::1:1 port = domain keep state tag ip6domain

Do you see the tcp6 listening socket on the master with "sockstat -l"?

Yes, both servers listen at udp6 and tcp6 addresses.

Do you use carp devices for the jails. I have seen some weird ipv6
routing behaviour with those myself.

No, I do only use my regular em0 device.

Thanks that you could confirm my configuration. Therefore, I'm now very much suspecting my host/jail setup and/or routing. I can reach every nsd server using a simple "telnet 1.2.3.4 53" from distinct servers, but failing miserably with "telnet a:a:a:a::1:1 53".

Further hints are highly welcome.

Thanks and with kind regards,
Michael

Hi, sorry for the noise.

- Or something else?

JFTR: It had nothing to do with my nsd setup, nor with keeping master and slave servers in FreeBSD jails, no the cause had to do with my PF firewall's outgoing rule:

Erroneous:

pass out log on $extIF inet6 proto {tcp, udp, icmp6, gre} all modulate state

Working:

pass out log on $extIF inet6 proto {tcp, udp, icmp6, gre} all

I'd used that rule for a very long time, but now the recent Upgrade to 9.1 broke that rule.

Thanks again and with kind regards,
Michael