596 cycles for Reverse Array with PSHUFB using 4 xmm---565 CPU Cycles seems to be the lowest reachable value on my system.While other routines change performance show every time, using 4 xmmunrolled 4 times tends to give always the same value. There are only two days a year when you ...