Maximum line length in python files was set to a generous 125 chars, in order to minimize number of changes needed in scripts and general annoyance. The "txt" prompts directory is excluded from the checks as it may contain oddly formatted files and strings for a good reason. Signed-off-by...
import multiprocessing import time #example worker, where you would put your stuff and report back def worker_func(input_queue, output_queue): while True: task = input_queue.get() if task["command"] == "exit": break elif task["command"] == "input": result = task["data"] #whatever...
LLAMA_CUDA_KQUANTS_ITER 1 or 2 2 Number of values processed per iteration and per CUDA thread for Q2_K and Q6_K quantization formats. Setting this value to 1 can improve performance for slow GPUs. LLAMA_CUDA_PEER_MAX_BATCH_SIZE Positive integer 128 Maximum batch size for which to enable...
`main` is now `llama-cli`, `server` is `llama-server`, etc (https://github.com/ggerganov/llama.cpp/pull/7809) ### Recent API changes - [2024 Jun 26] The source code and CMake build scripts have been restructured https://github.com/ggerganov/llama.cpp/pull/8006 - [2024...