fastmemcpy(sse)一般来说在cpu和内存之间存在l1l2和l3三级缓存还有几种tlb缓存在此不涉及每级缓存的速度有一个数量级左右的差别容量也有较大差别实际上跟有关呵呵而l1缓存更是细分为指令缓存和数据缓存用于不同的目的 让我们回过头来看看P4架构下的Cache结构。 The IA-32 Intel Architecture Software Developer's ...
So, for what i saw, using the SSE mnemonics as i did needs to be done together with a strategic (code) optimization. Analysing memcpy from msvcrt, i saw that they checks for the alignment of the address and go jmp here and there searching the best aligned addresses that can be copied....