We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Cancel Create saved search Sign in Sign up Reseting focus {...
llama.cpp web server is a lightweight OpenAI API compatible HTTP server that can be used to serve local models and easily connect them to existing clients. Bindings: Python: abetlen/llama-cpp-python Go: go-skynet/go-llama.cpp Node.js: withcatai/node-llama-cpp JS/TS (llama.cpp server...
We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Cancel Create saved search Sign in Sign up Reseting focus {...
API documentation available at /api and https://lite.koboldai.net/koboldcpp_api Supported GGML models (Includes backward compatibility for older versions/legacy GGML models, though some newer features might be unavailable): All up-to-date GGUF models are supported (Mistral/Mixtral/QWEN/Gemma ...
(void * buffer); +GGML_API void ggml_backend_cuda_log_set_callback(ggml_log_callback log_callback, void * user_data); #ifdef __cplusplus } #endif diff --git a/ggml-cuda/common.cuh b/ggml-cuda/common.cuh index 5957a61b11606..e2c3110b0123c 100644 --- a/ggml-cuda/common....
Provide feedback We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Cancel Create saved search Sign in Sign up {...
printf(" --interactive-specials allow special tokens in user text, in interactive mode\n"); printf(" --interactive-first run in interactive mode and wait for input right away\n"); printf(" -cnv, --conversation run in conversation mode (does not print special tokens and suffix/prefix)\n...