q3_k_s: Uses Q3_K for all tensors q4_0: Original quant method, 4-bit q4_1: Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models q4_k_m: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K q4_k_s: ...
24 + filetypeQ6_K 25 + filetypeIQ2_XXS 26 + filetypeIQ2_XS 27 + filetypeQ2_K_S 28 + filetypeQ3_K_XS 29 + filetypeIQ3_XXS 30 + 31 + filetypeUnknown 32 + ) 33 + 34 + func ParseFileType(s string) (filetype, error) { 35 + switch s { 36 + case "F32":...
llama 8B Q6_K 5.53 GiB 7.24 B ROCm 99 0 pp 4096 2484.03 ± 1.21 llama 8B Q6_K 5.53 GiB 7.24 B ROCm 99 0 tg 128 84.62 ± 0.02 llama ?B Q4_K - Small 17.59 GiB 33.34 B ROCm 99 1 pp 4096 374.89 ± 0.36 llama ?B Q4_K - Small 17.59 GiB 33.34 B ROCm 99 1 tg 128 26.58 ...
The domain has expired and cannot be accessed. It can be restored after renewal. 为避免域名被删除或被他人注册,请联系您的域名服务商尽快完成续费: 1. 若您是西部数码会员,请登西部数码官网,进入:管理中心->域名管理->已经到期,找到该域名,完成域名续费; ...
llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 293 llm_load_vocab: token to piece cache size = 0,9338 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = qwen2moe ...
LLaMA v2 7B mostly Q6_K 5.15 GiB 6.74 B CUDA 999 1 tg 128 92.56 ± 0.03 LLaMA v2 7B mostly Q5_K - Medium 4.45 GiB 6.74 B CUDA 999 1 tg 128 102.45 ± 0.01 LLaMA v2 7B mostly Q5_K - Small 4.33 GiB 6.74 B CUDA 999 1 tg 128 104.18 ± 0.01 LLaMA v2 7B mostly Q4_K - Medi...
const uint8_t * restrict q6 = x[i].ql; const uint8_t * restrict qh = x[i].qh; @@ -8704,7 +8704,7 @@ void ggml_vec_dot_q6_K_q8_K(int n, float * restrict s, size_t bs, const void * r for (int i = 0; i < nb; ++i) { const float d_all = (float)x[i]...
WeightType.Q2_K, WeightType.Q3_K, WeightType.Q4_K, WeightType.Q5_K, WeightType.Q6_K, } IMATRIX_QUANT_TYPES = { WeightType.IQ1_M, WeightType.IQ1_S, WeightType.IQ2_XXS, WeightType.IQ2_XS, WeightType.IQ2_S, WeightType.IQ3_XXS, ...