支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运...
ChatGLM3-6b + ChatGLM4-9b + GLMEdge-1.5b + GLMEdge-4b SmolLM EXAONE-3.0-7.8B-Instruct FalconMamba Models Jais Bielik-11B-v2.3 RWKV-6 QRWKV-6 GigaChat-20B-A3B Multimodal LLaVA 1.5 models, LLaVA 1.6 models BakLLaVA Obsidian ShareGPT4V MobileVLM 1.7B/3B models Yi-VL Mini CPM Moo...
ChatGLM3-6b+ChatGLM4-9b+GLMEdge-1.5b+GLMEdge-4b GLM-4-0414 SmolLM EXAONE-3.0-7.8B-Instruct FalconMamba Models Jais Bielik-11B-v2.3 RWKV-6 QRWKV-6 GigaChat-20B-A3B Trillion-7B-preview Ling models Multimodal LLaVA 1.5 models,LLaVA 1.6 models ...
先进入OpenCL-Headers目录下,把CL库导出到该目录下:# In OpenCL-Headersadb pull /system/vendor/lib...
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama...
ChatGLM3-6b+ChatGLM4-9b+GLMEdge-1.5b+GLMEdge-4b SmolLM EXAONE-3.0-7.8B-Instruct FalconMamba Models Jais Bielik-11B-v2.3 RWKV-6 QRWKV-6 GigaChat-20B-A3B Multimodal LLaVA 1.5 models,LLaVA 1.6 models BakLLaVA Obsidian ShareGPT4V ...
Python convert.py models/Llama-2-7b-chat/ # 结果 GGUF 文件 ls -al models/Llama-2-7b-chat...
ChatGLM3-6b+ChatGLM4-9b+GLMEdge-1.5b+GLMEdge-4b GLM-4-0414 SmolLM EXAONE-3.0-7.8B-Instruct FalconMamba Models Jais Bielik-11B-v2.3 RWKV-6 QRWKV-6 GigaChat-20B-A3B Trillion-7B-preview Ling models Multimodal LLaVA 1.5 models,LLaVA 1.6 models ...
llama-server -m model.gguf --port 8080 # Basic web UI can be accessed via browser: http://localhost:8080 # Chat completion endpoint: http://localhost:8080/v1/chat/completions Support multiple-users and parallel decoding # up to 4 concurrent requests, each with 4096 max context llama-ser...