另外llama.cpp的avx512仅用在浮点计算部分, 量化模型虽然检测了avx512vnni的flag, 但完全没有相关实现,...
The PR contains fixes to build llama.cpp with enhanced AVX512 flags (AVX512_BF16, AVX512_VNNI, AVX512_VBMI) with clang-cl and Visual Studio Generator Issue :While trying to build with clang-cl and Visual Studio Generator, while trying to enable to AVX512_VBMI, AVX512_VNNI and AVX512...
GCC Version = 12.3 The models were quantized and tested from meta-llama2 7B model -https://huggingface.co/meta-llama/Llama-2-7b The PR was tested in AMD Granite Ridge 9600X which supports the following flags by default : AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX5...