OpenBSD 6.4 only using 1 core

OpenBSD 6.4 and the included Unbound 1.8.1. Intel NUC with 2 real CPUS and 2 Virtual. Some sysctl stuff:

  hw.ncpu=4
  hw.ncpufound=4
  hw.ncpuonline=2
  hw.smt=0

“top” shows four cpus with work distributed to only two of them. “ps” and the statistics emitted by Unbound show only one CPU working.

ps output:

  87709 ?? Is 0:00.02 unbound -c /var/unbound/etc/unbound.conf
  86298 ?? S 1:49.24 unbound -c /var/unbound/etc/unbound.conf

unbound statistics (every hour):

Feb 18 13:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 13:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: server stats for thread 1: 47870 queries, 18749 answers from cache, 29121 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: server stats for thread 1: requestlist max 46 avg 5.96944 exceeded 0 jostled 0
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: average recursion processing time 0.888912 sec
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: histogram of recursion processing times
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: [25%]=0.0276987 median[50%]=0.0609112 [75%]=0.128225
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: lower(secs) upper(secs) recursions
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000000 0.000001 2779
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000008 0.000016 7
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000064 0.000128 1
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000128 0.000256 5
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000256 0.000512 25
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000512 0.001024 197
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.001024 0.002048 69
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.002048 0.004096 52
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.004096 0.008192 103
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.008192 0.016384 222
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.016384 0.032768 5530
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.032768 0.065536 6483
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.065536 0.131072 6653
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.131072 0.262144 4023
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.262144 0.524288 1786
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.524288 1.000000 604
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 1.000000 2.000000 224
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 2.000000 4.000000 72
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 4.000000 8.000000 64
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 8.000000 16.000000 53
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 16.000000 32.000000 52
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 32.000000 64.000000 49
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 64.000000 128.000000 4
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 128.000000 256.000000 30
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 256.000000 512.000000 23
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 512.000000 1024.000000 6
Feb 18 14:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 14:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0

unbound.conf:

        num-threads: 2

No other tweaks make to unbound.conf. Version is as distributed under OpenBSD 6.4. Unbound works find under OpenBSD 6.2 on a cpu with 2 real cores and no virtual.

I’m thinking that this issue may have something to do with the way OpenBSD is reporting available CPUs. I’m going to turn hyperthreading off at the BIOS level at some point (when I can.) I am wondering if anyone has any experience with this. Where “this” is OpenBSD and Unbound with hw.smt=0 and Unbound apparently not using more than one thread.

Thanks for any feedback!

—doug

OpenBSD 6.4 and the included Unbound 1.8.1. Intel NUC with 2 real CPUS and 2 Virtual. Some sysctl stuff:

  hw.ncpu=4
  hw.ncpufound=4
  hw.ncpuonline=2
  hw.smt=0

“top” shows four cpus with work distributed to only two of them. “ps” and the statistics emitted by Unbound show only one CPU working.

ps output:

  87709 ?? Is 0:00.02 unbound -c /var/unbound/etc/unbound.conf
  86298 ?? S 1:49.24 unbound -c /var/unbound/etc/unbound.conf

unbound statistics (every hour):

Feb 18 13:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 13:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: server stats for thread 1: 47870 queries, 18749 answers from cache, 29121 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: server stats for thread 1: requestlist max 46 avg 5.96944 exceeded 0 jostled 0
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: average recursion processing time 0.888912 sec
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: histogram of recursion processing times
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: [25%]=0.0276987 median[50%]=0.0609112 [75%]=0.128225
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: lower(secs) upper(secs) recursions
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000000 0.000001 2779
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000008 0.000016 7
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000064 0.000128 1
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000128 0.000256 5
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000256 0.000512 25
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000512 0.001024 197
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.001024 0.002048 69
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.002048 0.004096 52
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.004096 0.008192 103
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.008192 0.016384 222
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.016384 0.032768 5530
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.032768 0.065536 6483
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.065536 0.131072 6653
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.131072 0.262144 4023
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.262144 0.524288 1786
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.524288 1.000000 604
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 1.000000 2.000000 224
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 2.000000 4.000000 72
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 4.000000 8.000000 64
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 8.000000 16.000000 53
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 16.000000 32.000000 52
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 32.000000 64.000000 49
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 64.000000 128.000000 4
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 128.000000 256.000000 30
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 256.000000 512.000000 23
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 512.000000 1024.000000 6
Feb 18 14:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 14:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0

unbound.conf:

       num-threads: 2

No other tweaks make to unbound.conf. Version is as distributed under OpenBSD 6.4. Unbound works find under OpenBSD 6.2 on a cpu with 2 real cores and no virtual.

I’m thinking that this issue may have something to do with the way OpenBSD is reporting available CPUs. I’m going to turn hyperthreading off at the BIOS level at some point (when I can.) I am wondering if anyone has any experience with this. Where “this” is OpenBSD and Unbound with hw.smt=0 and Unbound apparently not using more than one thread.

Thanks for any feedback!

—doug

--
G Douglas Davidson
douglas@readyforgo.com

It appears that there is something in the logic of listen_dnsport.c that is attempting to use SO_REUSEPORT_LB that is causing OpenBSD to end up only using a single thread. When I add this option to the unbound.conf:

  so-reuseport: no

I get utilization on both threads, but it is very unbalanced. Still maybe a step in the right direction.

unbound1# ps -ax | grep unbound
30490 ?? Ss 0:02.21 unbound -c /var/unbound/etc/unbound.conf
42777 ?? S 0:00.47 unbound -c /var/unbound/etc/unbound.conf

and

Feb 18 17:04:20 unbound1 unbound: [30490:0] info: server stats for thread 0: 361 queries, 192 answers from cache, 169 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 17:05:20 unbound1 unbound: [42777:1] info: server stats for thread 1: 25 queries, 1 answers from cache, 24 recursions, 0 prefetch, 0 rejected by ip ratelimiting

I’m not attempting to find an approach specific to OpenBSD that provides better utilization of each real core.

—doug

It appears that prior to unbound version 1.8.0, the default value for so-reuseport was “no”. Subsequently the default is “yes”. A value of “yes” causes OpenBSD to use a single thread to process queries no matter the number of threads defined. It probably makes sense to make “no” a default. While all threads are used now with a value of “no”, the usage is unbalanced. One thread tends to have far more use. I’m not sure if this is actually bad, but it would be nice to have a more even distribution.

—doug

Hi Doug,

OpenBSD 6.4 and the included Unbound 1.8.1. Intel NUC with 2 real CPUS and 2 Virtual. Some sysctl stuff:

  hw.ncpu=4
  hw.ncpufound=4
  hw.ncpuonline=2
  hw.smt=0

“top” shows four cpus with work distributed to only two of them. “ps” and the statistics emitted by Unbound show only one CPU working.

ps output:

  87709 ?? Is 0:00.02 unbound -c /var/unbound/etc/unbound.conf
  86298 ?? S 1:49.24 unbound -c /var/unbound/etc/unbound.conf

unbound statistics (every hour):

Feb 18 13:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 13:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: server stats for thread 1: 47870 queries, 18749 answers from cache, 29121 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: server stats for thread 1: requestlist max 46 avg 5.96944 exceeded 0 jostled 0
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: average recursion processing time 0.888912 sec
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: histogram of recursion processing times
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: [25%]=0.0276987 median[50%]=0.0609112 [75%]=0.128225
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: lower(secs) upper(secs) recursions
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000000 0.000001 2779
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000008 0.000016 7
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000064 0.000128 1
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000128 0.000256 5
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000256 0.000512 25
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.000512 0.001024 197
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.001024 0.002048 69
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.002048 0.004096 52
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.004096 0.008192 103
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.008192 0.016384 222
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.016384 0.032768 5530
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.032768 0.065536 6483
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.065536 0.131072 6653
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.131072 0.262144 4023
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.262144 0.524288 1786
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 0.524288 1.000000 604
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 1.000000 2.000000 224
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 2.000000 4.000000 72
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 4.000000 8.000000 64
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 8.000000 16.000000 53
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 16.000000 32.000000 52
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 32.000000 64.000000 49
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 64.000000 128.000000 4
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 128.000000 256.000000 30
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 256.000000 512.000000 23
Feb 18 14:14:02 unbound1 unbound: [86298:1] info: 512.000000 1024.000000 6
Feb 18 14:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 14:14:02 unbound1 unbound: [87709:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0

unbound.conf:

      num-threads: 2

No other tweaks make to unbound.conf. Version is as distributed under OpenBSD 6.4. Unbound works find under OpenBSD 6.2 on a cpu with 2 real cores and no virtual.

I’m thinking that this issue may have something to do with the way OpenBSD is reporting available CPUs. I’m going to turn hyperthreading off at the BIOS level at some point (when I can.) I am wondering if anyone has any experience with this. Where “this” is OpenBSD and Unbound with hw.smt=0 and Unbound apparently not using more than one thread.

Thanks for any feedback!

—doug

--
G Douglas Davidson
douglas@readyforgo.com

It appears that there is something in the logic of listen_dnsport.c that is attempting to use SO_REUSEPORT_LB that is causing OpenBSD to end up only using a single thread. When I add this option to the unbound.conf:

  so-reuseport: no

I get utilization on both threads, but it is very unbalanced. Still maybe a step in the right direction.

unbound1# ps -ax | grep unbound
30490 ?? Ss 0:02.21 unbound -c /var/unbound/etc/unbound.conf
42777 ?? S 0:00.47 unbound -c /var/unbound/etc/unbound.conf

and

Feb 18 17:04:20 unbound1 unbound: [30490:0] info: server stats for thread 0: 361 queries, 192 answers from cache, 169 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Feb 18 17:05:20 unbound1 unbound: [42777:1] info: server stats for thread 1: 25 queries, 1 answers from cache, 24 recursions, 0 prefetch, 0 rejected by ip ratelimiting

I’m not attempting to find an approach specific to OpenBSD that provides better utilization of each real core.

—doug

It appears that prior to unbound version 1.8.0, the default value for so-reuseport was “no”. Subsequently the default is “yes”. A value of “yes” causes OpenBSD to use a single thread to process queries no matter the number of threads defined. It probably makes sense to make “no” a default. While all threads are used now with a value of “no”, the usage is unbalanced. One thread tends to have far more use. I’m not sure if this is actually bad, but it would be nice to have a more even distribution.

The default for so-reuseport was changed in 1.8.3 and later, it is
enabled for Linux and DragonFly. If you upgrade from your version to a
later one, then you should then get the more sensible default of off for
OpenBSD. That should distribute over the threads like was expected.

The _LB stuff is where it works for some FreeBSD (newest I think)
versions. I did not know OpenBSD has it. If it does have it, or start
to support it, you can then use it with so-reuseport: yes in unbound.conf.

The default was set to on in 1.8.0, because I believed it to be
harmless, and increase performance (not just balance it). It seems to
work like this on Linux, and I think the _LB version works similarly for
FreeBSD.

Best regards, Wouter

Appreciate the reply. With so-reuseport: no, things are chugging along nicely, although the load is not particularly evenly distributed. I don’t believe OpenBSD supports the _LB stuff as of now. I’m considering running unbound in a forked setting just as a way to simplify some of the optimization settings without having to recompile the OpenBSD kernel.

Thank you!

—doug

OpenBSD 6.4 and the included Unbound 1.8.1. Intel NUC with 2 real CPUS and 2 Virtual. Some sysctl stuff:

  hw.ncpu=4
  hw.ncpufound=4
  hw.ncpuonline=2
  hw.smt=0

"top" shows four cpus with work distributed to only two of them.

That's expected, recent OpenBSD no longer schedules to SMT (hyperthreading)
"cpu"s by default, controlled by sysctl hw.smt. We don't trust SMT, plus
with the current state of MP on OpenBSD for many workloads it works out
faster to avoid them anyway.

Appreciate the reply. With so-reuseport: no, things are chugging along
nicely, although the load is not particularly evenly distributed.

That's expected.

I don't believe OpenBSD supports the _LB stuff as of now.

Correct.

OpenBSD 6.4 and the included Unbound 1.8.1. Intel NUC with 2 real CPUS and 2 Virtual. Some sysctl stuff:

  hw.ncpu=4
  hw.ncpufound=4
  hw.ncpuonline=2
  hw.smt=0

"top" shows four cpus with work distributed to only two of them.

That's expected, recent OpenBSD no longer schedules to SMT (hyperthreading)
"cpu"s by default, controlled by sysctl hw.smt. We don't trust SMT, plus
with the current state of MP on OpenBSD for many workloads it works out
faster to avoid them anyway.

I understand this and agree with it. In my mind though, showing two cpus that are simply never going to get any worth is confusing (I’m easily confused.) But, again, I get it. Those cores are there, just not getting work.

I wonder if it make sense to turn hyperthreading off at the BIOS level simply as a way to keep things clean. So, looking for a recommendation here.

Appreciate the reply. With so-reuseport: no, things are chugging along
nicely, although the load is not particularly evenly distributed.

That's expected.

Can you share anything regarding how load is distributed among threads? I think the big question is, if OpenBSD does not support _LB, which I believe distributes load more evenly, how do I distribute load more evenly (and, do I even need to?) Running forked seems like another possibility. Memory is cheap. I’m not so concerned about the need for a shared cache. But that decision would be based on what logic causes work in the threaded version to be shunted to a particular thread. It does not seem to be round-robin. I’m just damn curious how that happens!

I don't believe OpenBSD supports the _LB stuff as of now.

Correct.

I very much appreciate the reply!

—doug

>
>>>>>> OpenBSD 6.4 and the included Unbound 1.8.1. Intel NUC with 2 real CPUS and 2 Virtual. Some sysctl stuff:
>>>>>>
>>>>>> hw.ncpu=4
>>>>>> hw.ncpufound=4
>>>>>> hw.ncpuonline=2
>>>>>> hw.smt=0
>>>>>>
>>>>>> "top" shows four cpus with work distributed to only two of them.
>
> That's expected, recent OpenBSD no longer schedules to SMT (hyperthreading)
> "cpu"s by default, controlled by sysctl hw.smt. We don't trust SMT, plus
> with the current state of MP on OpenBSD for many workloads it works out
> faster to avoid them anyway.

I understand this and agree with it. In my mind though, showing two
cpus that are simply never going to get any worth is confusing (I’m
easily confused.) But, again, I get it. Those cores are there, just
not getting work.

I wonder if it make sense to turn hyperthreading off at the BIOS level
simply as a way to keep things clean. So, looking for a recommendation
here.

IIRC top was changed in -current to skip them.

I prefer disabling in BIOS if I can (it's not always possible - some
BIOS don't give the option, though recently I've noticed some have
started adding it back - e.g. Lenovo have done this for some ThinkPads).
But it shouldn't matter too much which way you do it.

>> Appreciate the reply. With so-reuseport: no, things are chugging along
>> nicely, although the load is not particularly evenly distributed.
>
> That's expected.

Can you share anything regarding how load is distributed among
threads? I think the big question is, if OpenBSD does not support _LB,
which I believe distributes load more evenly, how do I distribute
load more evenly (and, do I even need to?) Running forked seems like
another possibility. Memory is cheap. I’m not so concerned about
the need for a shared cache. But that decision would be based on
what logic causes work in the threaded version to be shunted to a
particular thread. It does not seem to be round-robin. I’m just damn
curious how that happens!

Sorry, I'm not too sure about that - it maybe worth asking on one of
the OpenBSD lists about how the load is distributed between sockets
though.

I haven't heard from anyone optimizing unbound for high performance
on OpenBSD, I'd be interested to hear what you find if you do try.
("resperf" is probably useful for testing if you want to try that
- it's part of dnsperf, not packaged in 6.4 but is in -current).

If I was looking into this I would certainly want to test it with
dnsdist forwarding to multiple independent unbound instances (i.e.
each one bound to a different port number) as one of the
possibilities, I suspect that may be the simplest way to get
good balance.

OpenBSD 6.4 and the included Unbound 1.8.1. Intel NUC with 2 real CPUS and 2 Virtual. Some sysctl stuff:

  hw.ncpu=4
  hw.ncpufound=4
  hw.ncpuonline=2
  hw.smt=0

"top" shows four cpus with work distributed to only two of them.

That's expected, recent OpenBSD no longer schedules to SMT (hyperthreading)
"cpu"s by default, controlled by sysctl hw.smt. We don't trust SMT, plus
with the current state of MP on OpenBSD for many workloads it works out
faster to avoid them anyway.

I understand this and agree with it. In my mind though, showing two
cpus that are simply never going to get any worth is confusing (I’m
easily confused.) But, again, I get it. Those cores are there, just
not getting work.

I wonder if it make sense to turn hyperthreading off at the BIOS level
simply as a way to keep things clean. So, looking for a recommendation
here.

IIRC top was changed in -current to skip them.

I prefer disabling in BIOS if I can (it's not always possible - some
BIOS don't give the option, though recently I've noticed some have
started adding it back - e.g. Lenovo have done this for some ThinkPads).
But it shouldn't matter too much which way you do it.

Makes sense. I believe I am able to with my particular box.

Appreciate the reply. With so-reuseport: no, things are chugging along
nicely, although the load is not particularly evenly distributed.

That's expected.

Can you share anything regarding how load is distributed among
threads? I think the big question is, if OpenBSD does not support _LB,
which I believe distributes load more evenly, how do I distribute
load more evenly (and, do I even need to?) Running forked seems like
another possibility. Memory is cheap. I’m not so concerned about
the need for a shared cache. But that decision would be based on
what logic causes work in the threaded version to be shunted to a
particular thread. It does not seem to be round-robin. I’m just damn
curious how that happens!

Sorry, I'm not too sure about that - it maybe worth asking on one of
the OpenBSD lists about how the load is distributed between sockets
though.

I haven't heard from anyone optimizing unbound for high performance
on OpenBSD, I'd be interested to hear what you find if you do try.
("resperf" is probably useful for testing if you want to try that
- it's part of dnsperf, not packaged in 6.4 but is in -current).

If I was looking into this I would certainly want to test it with
dnsdist forwarding to multiple independent unbound instances (i.e.
each one bound to a different port number) as one of the
possibilities, I suspect that may be the simplest way to get
good balance.

Oh shit, I did not even think of that! Thanks man!