(Default: 2048) | int | num_ctx 4096 | | num_gqa | The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for llama2:70b | int | num_gqa 8 | | num_gpu | The number of GPUs to use. On macOS it defaults to 1 to enable metal...
(latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 1048576 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 ...
Device# show platform hardware qfp active feature firewall memory ==FW memory info== Chunk-Pool Allocated Total_Free Init-Num Low_Wat --- scb 0 16384 16384 4096 hostdb 0 5120 5120 1024 ICMP Error 0 256 256 128 teardown 0 160 160 80 ha retry 0 2048 2048 512 dst pool 0 5120 5120...
promptCtx.repeat_penalty, promptCtx.n_last_batch_tokens - 1); }1 change: 1 addition & 0 deletions 1 gpt4all-backend/llmodel.h Original file line numberDiff line numberDiff line change @@ -66,6 +66,7 @@ class LLModel { int32_t n_predict = 200; int32_t top_k = 40; float...