glibc有一个类似using AdvSIMD的AArch 64版本,还有一个用于AArch 64 CPU的版本,其中vector->GP寄存器...
Well, the only thing that is "broken" is that I had to convert it in order to Intel C++ compiler swallows it (GetRDTSC() function was initially __fastcall, that's why I have one mistake in the strlen_Traditional() function), so the final version should have one instruction less: __...
*/ 31 uint64_t v = *p | MASK (s_int); 32 33 uint64_t bits; 34 while ((bits = __insn_v1cmpeqi (v, 0)) == 0) 35 v = *++p; 36 37 return ((const char *) p) + (CFZ (bits) >> 3) - s; 38 } 39 libc_hidden_builtin_def (strlen)...
But yes, any strcmp/strlen style function that uses SSE/AVX/etc. without a buffer size parameter, must first read in smaller units until it gets to a 16/32 byte aligned address, after which it can start using SSE/AVX safely without having to worry about faulting even if it reads past ...