(NB am using "asm goto" feature in the above code, it interacts poorly with optimization sometimes because gcc does not know it depends on the flags status) Jeroen van Bemmel, 13 years ago Comment on the SSE4.2 strlen() implementation: it actually performs much worse than the following SSE2...
FWIW, here is an implementation that prevents crossing MMU pages using SSE instructions. String addresses are passed in RSI and RDI. ; strcmp2- ; ; String comparison using pcmpistri instruction ; and computing string lengths ahead of time. strcmp2 proc xmm0Save textequ <[rsp]> xmm1Save tex...