Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

The implementations of extensions on different CPU microarchitectures can vary quite a bit from the same vendor. And performance of those extensions can vary by an order of magnitude between CPU vendors, sometimes because vendors will initially add CPU extensions as microcode if they did not design the extension themselves. AMD has microcoded many Intel extensions in some of their architectures for binary compatibility but without performance benefits, and many of their native implementations behave differently in material ways for optimization. The existence of the feature does not imply you should use it for highly optimized code, it may be slower than software versions of the same thing, or that you should use the feature in the same way to optimize code.

It is quite complicated. The AMD and Intel microarchitectures are different enough in design that some low-level optimizations around many extensions really don't translate well between them. The differences can be big enough that you take them into account at the C++ level too, writing target-specific code.

Individual CPU vendors are in the best position to provide software optimization for specific implementations of their microarchitectures. Unfortunately, AMD invests relatively little in this and relies on the open source community to fill in these gaps whereas Intel is excellent at providing first-party software optimization support for their CPUs.



Although you probably overestimate a bit the impact of uarch diff between haswell and zen, there is some truth in theory in what you said. In practice, zen is quite close to Skylake in terms of general uarch principles and main figures, and zenv2 even more so (or even better). IIRC for ZenV1 AVX2 should yield no gain compared to AVX, though.

glibc is not Intel's project and is expected to have a minimum amount of neutrality in own it is maintained -- that also includes how patches are accepted or modifications are asked before they are integrated.

The dispatching is probably to implement things like memset & memcpy etc, which are easy to benchmark, and it is probable that the haswell version will at least be better than whatever is used right now with an amd zen (and even more probable for zenv2). Optimizing further can come later, if anybody wants to do it.

Therefore, I think this ticket is justified (but I also think that this is not a drama that is has not be taken care of sooner)


Thanks for the insight, last time I remember writing assembly for speed was probably the early sse days.

Sounds like a great benchmarking rabbit hole to go down though. :-)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: