partial problem resolving kernel-error.de

Hello,

the Domain use huge keys: https://zonemaster.net/test/f8b42c485139ea99
Also DNSViz http://dnsviz.net/d/kernel-error.de/dnssec/ show warnings.

But most of my unbound-host resolve without problems except instances on
"cheap hosted virtual machines"
As far as I can tell all unbound servers are configured identical:

server:
  chroot: /etc/unbound
  minimal-responses: yes
  harden-below-nxdomain: yes
  harden-referral-path: yes
  harden-glue: yes
  outgoing-tcp-mss: 1220
  qname-minimisation: yes
  tcp-mss: 1220
  use-caps-for-id: yes
  val-log-level: 2
  auto-trust-anchor-file: trust/root-rfc5011.anchor
  # do-ip4: yes
  # do-ip6: yes

"verbosity: 2" flood log errors when I "dig @$resolver kernel-error.de. dnskey +dnssec"
2017-05-30 00:03:24.413773500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 5.9.24.235
2017-05-30 00:03:24.419315500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 5.9.24.235
2017-05-30 00:03:24.419584500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-30 00:03:24.424685500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2a01:4f8:150:1095::53
2017-05-30 00:03:24.430201500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 5.9.24.235
2017-05-30 00:03:24.432426500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-30 00:03:24.435559500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-30 00:03:24.441102500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 5.9.24.235
2017-05-30 00:03:24.446647500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-30 00:03:24.452158500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-30 00:03:24.457540500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-30 00:03:24.691478500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 203.137.119.119
2017-05-30 00:03:24.698210500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-30 00:03:24.731290500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-30 00:03:24.950555500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 203.137.119.119
2017-05-30 00:03:24.953444500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 203.137.119.119
2017-05-30 00:03:24.992109500 [1496095404] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-30 00:03:25.202152500 [1496095405] unbound[4398:0] error: tcp sendmsg: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-30 00:03:25.229939500 [1496095405] unbound[4398:0] error: tcp sendmsg: Broken pipe for 203.137.119.119
2017-05-30 00:03:25.253539500 [1496095405] unbound[4398:0] error: tcp sendmsg: Broken pipe for 203.137.119.119
2017-05-30 00:03:25.462916500 [1496095405] unbound[4398:0] error: tcp sendmsg: Broken pipe for 203.137.119.119

Bonus: only my own unbound-1.6.2 @cheap hosted virtual machines can't resolve,
Debian Jessie Distribution unbound + bind work "@cheap hosted virtual machines" :-/

Ideas?

The owner of kernel-error.de will change it's domain in the next time.
I ask him to freeze the configuration some days until I understand why my resolver fail.

Thanks,
Andreas

Hi Andreas,

The failure you see is in the code for TCP FASTOPEN. It was enabled
when you gave the configure option --enable-tfo-client.

We cannot do r = sendmsg(fd, &msg, MSG_FASTOPEN); to perform a TCP
FASTOPEN on the tcp connection. It returns the errno that you see printed.

That cheap VM has tcp fastopen issues. Do you think MSG_FASTOPEN is
broken in that linux kernel or the hoster broke it (i.e. blocked in
Firewall?).

Best regards, Wouter

W.C.A. Wijngaards via Unbound-users:

The failure you see is in the code for TCP FASTOPEN. It was enabled
when you gave the configure option --enable-tfo-client.

TCP FASTOPEN...
your explanation match perfectly to my observation :slight_smile:

I just tried to disable any ipfilter on the failing host. that changes nothing.
Also I guess the failing host don't support TCP FASTOPEN at all:

# cat /proc/sys/net/ipv4/tcp_fastopen
cat: /proc/sys/net/ipv4/tcp_fastopen: No such file or directory

# uname -r
3.2.0-4-686-pae

Even if compiled in, can I control TCP FASTOPEN usage on application level?

I've other hosts with an without kernel support for TCP FASTOPEN.
Both classes can resolve the domain in question.

On a plattform with broken TCP FASTOPEN support ( even if noct supported by the kernel)
I currently cant disable it, I would need an other unbound binary - right?

Andreas

A. Schulze via Unbound-users:

On a plattform with broken TCP FASTOPEN support (even if not supported by the kernel)
I currently cant disable it, I would need an other unbound binary - right?

is there really no option to disable TCP_FASTOPEN usage by configuration?
clarification is appreciated.

Andreas

Hi Andreas,

A. Schulze via Unbound-users:

On a plattform with broken TCP FASTOPEN support (even if not supported
by the kernel)
I currently cant disable it, I would need an other unbound binary -
right?

is there really no option to disable TCP_FASTOPEN usage by configuration?
clarification is appreciated.

There is only a configure time option and not a config option. We don't
want it to be a config option, we want it to work all the time. Below
is a patch, but I don't know if it works, it makes the code fallthrough
to try normal TCP writes when FASTOPEN writes fail.

Best regards, Wouter

Andreas

Index: util/netevent.c

W.C.A. Wijngaards via Unbound-users:

There is only a configure time option and not a config option. We don't
want it to be a config option, we want it to work all the time.

sounds reasonable

Below is a patch, but I don't know if it works, it makes the code fallthrough
to try normal TCP writes when FASTOPEN writes fail.

I'll try the patch and report results...

Andreas

compiled and installed but no change.

...
2017-05-31 18:36:16.868823500 [1496248576] unbound[22766:0] error: tcp writev: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-31 18:36:16.874287500 [1496248576] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-31 18:36:17.143379500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 203.137.119.119
2017-05-31 18:36:17.405554500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 203.137.119.119
2017-05-31 18:36:17.411117500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-31 18:36:17.416440500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:150:1095::53
2017-05-31 18:36:17.422345500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:150:1095::53
2017-05-31 18:36:17.422347500 [1496248577] unbound[22766:0] info: Missing DNSKEY RRset in response to DNSKEY query.
2017-05-31 18:36:17.422348500 [1496248577] unbound[22766:0] info: Missing DNSKEY RRset in response to DNSKEY query.
2017-05-31 18:36:17.422349500 [1496248577] unbound[22766:0] info: Missing DNSKEY RRset in response to DNSKEY query.
2017-05-31 18:36:17.422351500 [1496248577] unbound[22766:0] info: Missing DNSKEY RRset in response to DNSKEY query.
2017-05-31 18:36:17.422360500 [1496248577] unbound[22766:0] info: Missing DNSKEY RRset in response to DNSKEY query.
2017-05-31 18:36:17.422361500 [1496248577] unbound[22766:0] info: Missing DNSKEY RRset in response to DNSKEY query.
2017-05-31 18:36:17.422363500 [1496248577] unbound[22766:0] info: Could not establish a chain of trust to keys for kernel-error.com. DNSKEY IN
2017-05-31 18:36:17.422364500 [1496248577] unbound[22766:0] info: validation failure <kernel-error.com. MX IN>: No DNSKEY record for key kernel-error.com. while building chain of trust
2017-05-31 18:36:17.422368500 [1496248577] unbound[22766:0] info: ::1 kernel-error.com. MX IN SERVFAIL 16.122824 0 34
2017-05-31 18:36:17.422369500 [1496248577] unbound[22766:0] info: Could not establish a chain of trust to keys for kernel-error.com. DNSKEY IN
2017-05-31 18:36:17.422376500 [1496248577] unbound[22766:0] info: validation failure <kernel-error.com. MX IN>: No DNSKEY record for key kernel-error.com. while building chain of trust
2017-05-31 18:36:17.422378500 [1496248577] unbound[22766:0] info: ::1 kernel-error.com. MX IN SERVFAIL 16.979449 0 34
2017-05-31 18:36:17.422379500 [1496248577] unbound[22766:0] info: resolving kernel-error.com. DNSKEY IN
2017-05-31 18:36:17.671634500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2001:310:6000:f::1fc7:1
2017-05-31 18:36:17.677029500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:150:1095::53
2017-05-31 18:36:17.682693500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-31 18:36:17.682694500 [1496248577] unbound[22766:0] info: Capsforid: timeouts, starting fallback
2017-05-31 18:36:17.687988500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:150:1095::53
2017-05-31 18:36:17.693575500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 176.9.109.53
2017-05-31 18:36:17.699173500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 176.9.109.53
2017-05-31 18:36:17.704899500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:161:3ec::53
2017-05-31 18:36:17.710267500 [1496248577] unbound[22766:0] error: tcp writev: Broken pipe for 2a01:4f8:161:3ec::53
...

I just also set "use-caps-for-id: no" to exclude that but (this time expected) no change.

Andreas

Hi Andreas,

It is performing the fallthrough, but writev is also reporting an issue.
That I attempt to fix in the patch below, that should attempt a
connect() after the failed fastopen, before calling write. Use this
patch on code that has the previous patch applied.

Best regards, Wouter

(attachments)

patch_fastopen_connect.diff (1.02 KB)

Yea, it works!

attached my full patch (I had to add an explicit type cast)

with the patch applied and unbound restarted I did "dig @::1 kernel-error.de. dnskey +dnssec"
and immediately got the expected response.

Many Thanks, Wouter!
Andreas

(attachments)

tfo_fallback.patch (1.5 KB)