Unbound Slow to Prime

It used to be the case that I could flush the unbound cache and then resolve a name immediately. Now there is a long delay before I start to get results. It is especially bad on Fedora 15 running in VirtualBox on OS X Lion, but I'm also seeing it slow down on Fedora 15 at EC2. I updated root.hints, but that didn't seem to help. I can't make much sense of the debugging logging output. A couple weeks ago, flushing cache or restarting, you might notice a light delay on the first lookup, but now it takes a few minutes to get results other than timeouts.

server:
  verbosity: 2
  num-threads: 1
  interface: 127.0.0.1
  do-ip4: yes
  do-ip6: no
  do-udp: yes
  do-tcp: no
  username: "unbound"
  directory: "/etc/unbound"
  log-time-ascii: yes
  pidfile: "/var/run/unbound/unbound.pid"
  root-hints: "/etc/unbound/root.hints"
  harden-glue: yes
  harden-dnssec-stripped: yes
  harden-referral-path: yes
  use-caps-for-id: yes
  unwanted-reply-threshold: 10000000
  prefetch: yes
  prefetch-key: yes
  dlv-anchor-file: "/etc/unbound/dlv.isc.org.key"
  trusted-keys-file: /etc/unbound/root.key
  val-clean-additional: yes
  val-permissive-mode: no
  val-log-level: 1
remote-control:
  control-enable: yes
  server-key-file: "/etc/unbound/unbound_server.key"
  server-cert-file: "/etc/unbound/unbound_server.pem"
  control-key-file: "/etc/unbound/unbound_control.key"
  control-cert-file: "/etc/unbound/unbound_control.pem"

It used to be the case that I could flush the unbound cache and then resolve a name immediately. Now there is a long delay before I start to get results. It is especially bad on Fedora 15 running in VirtualBox on OS X Lion, but I'm also seeing it slow down on Fedora 15 at EC2. I updated root.hints, but that didn't seem to help. I can't make much sense of the debugging logging output. A couple weeks ago, flushing cache or restarting, you might notice a light delay on the first lookup, but now it takes a few minutes to get results other than timeouts.

server:
  verbosity: 2
  num-threads: 1
  interface: 127.0.0.1
  do-ip4: yes
  do-ip6: no
  do-udp: yes
  do-tcp: no

Why do you have do-tcp set to no? That will cause problems on large dnssec replies that get
truncated.

  use-caps-for-id: yes

You can try disabling this, some servers don't do 0x20 properly and it might be causing delays.

Paul

It doesn’t look like you would experience this within VirtualBox or EC2, but I had a very similar issue with Unbound behind a Cisco ASA. The ASA would reject responses > 512 bytes. You may find a wireshark trace worth your time.

https://supportforums.cisco.com/message/3064602

-M

I tried disabling this, but it was working fine before about two weeks ago. Now I'm finding a slow start everywhere. I've created a gist, with the output of the unbound logs, with verbose at 9. It goes from startup to the successful resolution of "dig NS .". The first time I invoke dig, it times out. Then I wait a while and invoke dig again, for a successful resolution.

https://gist.github.com/1131622

There is a lot there. I'm not sure what to look for. If there is anything I can do to make it easier to get feedback from the group, let me know.

Hi Alan,

It used to be the case that I could flush the unbound cache and then resolve a name immediately. Now there is a long delay before I start to get results. It is especially bad on Fedora 15 running in VirtualBox on OS X Lion, but I'm also seeing it slow down on Fedora 15 at EC2. I updated root.hints, but that didn't seem to help. I can't make much sense of the debugging logging output. A couple weeks ago, flushing cache or restarting, you might notice a light delay on the first lookup, but now it takes a few minutes to get results other than timeouts.

server:
  verbosity: 2
  num-threads: 1
  interface: 127.0.0.1
  do-ip4: yes
  do-ip6: no
  do-udp: yes
  do-tcp: no

Why do you have do-tcp set to no? That will cause problems on large dnssec replies that get
truncated.

Yes that would be good to enable.

  use-caps-for-id: yes

You can try disabling this, some servers don't do 0x20 properly and it might be causing delays.

I tried disabling this, but it was working fine before about two weeks ago. Now I'm finding a slow start everywhere. I've created a gist, with the output of the unbound logs, with verbose at 9. It goes from startup to the successful resolution of "dig NS .". The first time I invoke dig, it times out. Then I wait a while and invoke dig again, for a successful resolution.

https://gist.github.com/1131622

There is a lot there. I'm not sure what to look for. If there is anything I can do to make it easier to get feedback from the group, let me know.

What seems to be happening is this:
Aug 8 11:20:37 celje unbound: [6762:0] debug: timeout udp

You have only a DLV key. The root.key file you read with the statement
trusted-keys-file (which is for BIND compatibility). I would expect you
to use auto-trust-anchor-file: "....root.key". (and use unbound-anchor
to make that file in the rc/init script). There are no trust anchors in
your root.key file?

Thus, because you do not have a root key. But a DLV key. And 0x20 and
harden-referral-path. For me that creates a queue of 164 on the first
prime. And it takes a long time to prime for me too. (perhaps
something like an adsl modem is dropping connections because there are
too many?) If I add a normal root trust anchor, the first prime
succeeds (in a second).

So, your combination of options creates a lot of work. You can wait for
that. Or you can add a root trust anchor, which I think you intend to?

Best regards,
   Wouter