Today I added some records to a zone and it made the AXFR size greater than one packet. At that point, the zone would no longer transfer from my hidden master to my slaves (everything is running 4.0.1).
Normally, all of the zone transfers are done over IPv6. The transfers did work when I tested them over IPv4, but I can't reliably use IPv4. My kludge was to break the zonefile up into several subzones, making each small enough to AXFR in a single packet.
I'm not sure how to document this other than showing you the "operation timed out: tcp" log entries and zonestatus output that shows the slaves are not getting the zone.
Have others run into this issue, or is this a known issue? The relevant terms appear to be too common or vague for an effective search engine query.
as you can read from archives of this list I raised this issue last
March 13, but I'm currently experiencing this issue.
After upgrading NSD version (from 4.0.1 to 4.0.3) and after applying
Wouter's suggestion (--disable-recvmmsg configure) nothing has changed.
So I asked VMWare for support because, in my scenario, it happens with
VMs inside a ESXi 5.5 cluster.
I'll come back to the list as soon as the case is solved.
FWIW, my affected systems are FreeBSD 9.1, 9.2 and 10.0. I'm using the available pkg packages. I can try installing from ports, but that will need to wait a few days.
After upgrading NSD version (from 4.0.1 to 4.0.3) and after applying
Wouter's suggestion (--disable-recvmmsg configure) nothing has changed.
Hello,
<may be incorrect>
as far as I learned on IPv6, fragmentation is not allowed anymore.
In v4 an app/host could send a huge packet and router/gateways where allowed to fragment as needed.
In v6 a router/gateways are not allowed to fragment a packet. They has to send back an "ICMP Packet too big"
And the app/host has to resend again in smaller portions.
</>
If NSD is emitting packets that are bigger than the IPv6 path MTU to the
slave, then a device along the path will send back an ICMP message
asking the source to fragment. If this ICMP message never reaches the
master, it won't know that it needs to fragment the packets, and will
keep sending bigger packets, and result in a timeout.
On the master, run tcpdump, and then send out large packets to the slave
(ping6 will do) and see if you're getting back the relevant ICMP
message, and whether the network stack on the master is adapting itself
to such a notificaiton.
It looks like something mid-path in the master's ISP that's breaking PMTU. I can get large pings between the slaves, but I can only get large pings a few steps through the master's ISP. I was really hoping it was something dumb like I had left the fragment rules out of my rulesets.
I'm not sure how to document this other than showing you the
"operation timed out: tcp" log entries and zonestatus output
that shows the slaves are not getting the zone.
If NSD is emitting packets that are bigger than the IPv6 path MTU
to the slave, then a device along the path will send back an ICMP
message asking the source to fragment. If this ICMP message never
reaches the master, it won't know that it needs to fragment the
packets, and will keep sending bigger packets, and result in a
timeout.
On the master, run tcpdump, and then send out large packets to
the slave (ping6 will do) and see if you're getting back the
relevant ICMP message, and whether the network stack on the
master is adapting itself to such a notificaiton.
It looks like something mid-path in the master's ISP that's
breaking PMTU. I can get large pings between the slaves, but I can
only get large pings a few steps through the master's ISP. I was
really hoping it was something dumb like I had left the fragment
rules out of my rulesets.
But NSD uses TCP for zone transfers? I thought that PMTU discovery
does not really apply to TCP? NSD is unable to create a TCP stream
and send more than one data-packet worth of data on it? And you
report that NSD reports a timeout when that happens - like no more
packets are arriving. Some sort of stateful firewall that has a state
problem?
PMTU doesn't apply here as it wasn't an MTU issue. It was a fragmentation issue. In IPv4, fragmentation is TCP's job. The TCP payload is segmented and each packet on the wire has a TCP header at the top of the IP payload. In IPv6, fragmentation is IP's job. The system generates a single large TCP packet, the IP layer enables the fragmentation extension header, and segments the TCP packet itself. The problem with that approach is the TCP header only exists in the first fragment and it's now 8 bytes down from where it would be without the frag header.
What happens when those fragments reach a stateful/DPI firewall should be obvious. A well-written firewall will see the fragmentation header and perform the state check on the identification field of the frag header instead of the TCP port tuple. In this case, though, the router was running an older version of pf that doesn't handle IPv6 fragments.
Prior to the issue, the zone AXFR in question was only 1151 bytes--small enough for a single packet. When I added several new hosts to the zone with their A, AAAA, TXT, MX and SPF records, the AXFR grew to 1720 bytes--large enough to fragment and not pass this errant router. Once the ISP added "pass inet6 proto ipv6-frag all" to the ruleset for my port, it worked just fine.
“setsockopt(TCP_MAXSEG, 1220)” will help if it is PMTUD or fragmentation issue.
IMHO DNS responder should always do “setsockopt(TCP_MAXSEG, 1220)”
for all TCP sockets (or implement an option to setting MAXSEG size) to avoid
PMTUD/IP frags because:
PMTUD process may add more delay even if it worked properly.
ICMP TooBig (for PMTUD) or IP frags are often dropped at broken firewalls.