K 量化(又称 K-Quants)涉及将模型的权重拆分为“超级块”,然后进一步细分为更小的子块。每个子块都有自己的比例和最小值,这些比例和最小值被量化为有限数量的位 - 通常为 8、6 或 4,具体取决于特定的量化方法,例如 Q2_K、Q4_K 或 Q5_K。这些比例和最小值有助于确保模型在精度降低时也能保持准确...
K 量化(又称 K-Quants)涉及将模型的权重拆分为“超级块”,然后进一步细分为更小的子块。每个子块都有自己的比例和最小值,这些比例和最小值被量化为有限数量的位 - 通常为 8、6 或 4,具体取决于特定的量化方法,例如 Q2_K、Q4_K 或 Q5_K。这些比例和最小值有助于确保模型在精度降低时也能保持准确性。
Qwen2.5.1-代码-7B-说明-Q2_K_L.gguf Q2_K_L 3.55GB 错误的将Q8_0用于嵌入和输出权重。质量很低,但令人惊讶的是可用性。 Qwen2.5.1-代码-7B-说明-Q3_K_S.gguf Q3_K_S 3.49GB 错误的质量低,不推荐。 Qwen2.5.1-代码-7B-指令-IQ3_XS.gguf IQ3-XS 3.35GB 错误的质量较低,新方法性能良好,略...
GGML_FTYPE_MOSTLY_Q2_K = 10, // except 1d tensors GGML_FTYPE_MOSTLY_Q3_K = 11, // except 1d tensors GGML_FTYPE_MOSTLY_Q4_K = 12, // except 1d tensors GGML_FTYPE_MOSTLY_Q5_K = 13, // except 1d tensors GGML_FTYPE_MOSTLY_Q6_K = 14, // except 1d tensors GGML_F...
QwQ-32B预览-Q2_K_L.gguf Q2_K_L 13.07GB 错误的将Q8_0用于嵌入和输出权重。质量很低,但令人惊讶的是可用性。 QwQ-32B预览-Q2_K.gguf Q2_K 12.31GB 错误的质量很低,但令人惊讶的是可用性。 QwQ-32B预览-IQ2_M.gguf IQ2_M 11.26 GB 错误的相对较低的质量,使用SOTA技术令人惊讶地可用。 QwQ-32B...
case GGML_TYPE_Q2_K: result = quantize_q2_K(src + start, (char *) dst + start_row * row_size, nrows, n_per_row, imatrix); break; case GGML_TYPE_Q3_K: result = quantize_q3_K(src + start, (char *) dst + start_row * row_size, nrows, n_per_row, imatrix); break; ...
default_cpu_kernel_code = "QlpoOTFBWSZTWXLbSoQAAgzbgERwQXxmTwAAr/ff3kABt0Q2oRVT0hpo9RtEAAAAyBEiSQ9EGjQGQAAAwANGhowjJoNGmgMEUplMTNSMJ5TQaDJpsoMyRMj8P4mZzFSVVwqSXG8GG7MlVwiToYEQwVD7noBxMhNfkeZYtYFtbgOBUSIGtIQjhNHCEnPJsadhb3yBmRIOD3TeAtNLSaU5GgvKUBWSNuuOIHmVt0YhW6rsmDMDUjeUJGJ64R1Jm5...
−Q2<e<Q2 where as for truncation, it is 0≤e<Q It is obvious that rounding produces a less biased representation of the analog values. The average error is given by e¯=1Q∫−Q/2Q/2ede=0 which means that on average half the values are rounded up and half rounded down. The...
| **Qwen2.5-Coder-32B-Instruct-GGUF-Q2_K** | 92.7 | 84.8 | 87.3 | 74.3 | 47.6 | 23.0 | 28.5 | ### Multiple Programming Languages | | Python | Java | C++ | C# | TS | JS | PHP | Bash | Avg. | |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---...
Recall that the average noise power for uniform quantization is Q2/12. The addition of rectangular dither will double this average noise power and the addition of triangular dither will triple it. However, if we look at the frequency spectrum of the dithered and quantized signal of the example...