Run-time effects of new SIMD code

Hi NSD developers,

I’ve been following recent discussion and activity around building NSD 4.10, triggered by build failure in Homebrew. I see that you added some code to detect more things and adjust the build based on what type of processor is detected.

If one were to build NSD on an x86_64 CPU with Haswell architecture, and then run this build on another x86_64 CPU that doesn’t have that architecture, will NSD not run, or will it crash, or will it silently use sub-optimal code?

Regards,
Anand

Looking at the code, it's using runtime detection based on checking
cpuid results, so it should use whatever is the most optimal for the cpu
on the machine where it runs.

Hi Anand, Stuart,

That is indeed correct. At runtime the CPUID instruction is used to
detect which extensions are offered and it chooses based on that. When
compiling on x86_64 with SSE4.2 and AVX2 enabled, the binary will
actually contain 3 parsers, the most optimal one is used at runtime.

Optimized versions can be disabled with --disable-westmere (SSE4.2) and
--disable-haswell (AVX2) if desired btw.

Note that "sub-optimal" is still a couple times faster than 4.9
(parsing, does not translate to zone loading in 1-1 fashion yet, but
we'll get to that :-)).

Best regards,
Jeroen

Hi Jeroen,

That is indeed correct. At runtime the CPUID instruction is used to
detect which extensions are offered and it chooses based on that. When
compiling on x86_64 with SSE4.2 and AVX2 enabled, the binary will
actually contain 3 parsers, the most optimal one is used at runtime.

Thanks for this clarification. NSD doesn't log this info when starting, and "nsd -v" also doesn't reveal which parser it has chosen. I think such information will be useful to both users and developers, especially when trying to debug an issue. Do you think you could add this to the output of "nsd -v" (something like which parsers have been compiled, and which one is being chosen)?

So a question: as a packager and distributor, do I need to build on a processor that has both SSE4.2 and AVX2 instructions enabled? And if one or both of these are not enabled, will the resulting build contain fewer parsers?

Optimized versions can be disabled with --disable-westmere (SSE4.2) and
--disable-haswell (AVX2) if desired btw.

Thanks for providing these options. But it would help users to know the effects of enabling/disabling these options, so that they can make an informed choice.

Another question: if a user has a binary containing all three parsers, and wants to disable one or more of them at run-time, how is the user supposed to do this?

Actually, I just went to look at doc/README, which has notes on how to compile NSD, but not all the enable/disable options are fully documented, so a user cannot make informed choices about whether to enable/disable certain features in their builds. I think I'll open a separate issue about this.

Regards,
Anand

Hi Anand,

Doing it with "nsd -v" is not implemented right now because it's meant
to be transparent to users of the library, but I think you're right and
it'd be good to show the info.

The kernel is selected when the parse function is called. For NSD it
makes sense to select one on startup and always use that one. There's a
GH issue for that: https://github.com/NLnetLabs/simdzone/issues/116. I
didn't consider printing it before. I have created a GH issue for it:
https://github.com/NLnetLabs/nsd/issues/354.

To select a kernel on startup, you can set the ZONE_KERNEL environment
variable. Possible values: fallback, westmere, haswell. We'll might add
a command line argument or configuration option for it.

As always, thanks for your suggestions.

- Jeroen

Thanks for the update Jeroen, and for creating the issues to track the suggestions.

You didn’t answer the parts of my email relating to what happens at compile time, and how the resulting NSD binary is affected depending on what architecture it’s compiled on. If you could expand on those, I’d appreciate it.

Regards,
Anand

Hi Anand,

The build host doesn't need to have the same features as the target.
However, the compiler running on the build host must know how to
compile for SSE4.2 and AVX2 architectures. For example, you can compile
for x86_64 on ARM (and vice versa). It works in the same way for
extensions.

So, if you use an older compiler that is does not know how to output
code for SIMD instructions, you can't generate a binary that has those.
But it really comes down to the compiler not the build host
architecture. At least, in theory.

I think that answer your question? Let me know if want to discuss in
more detail.

Best regards,
Jeroen