Fixes LostRuins#1390 . The logic for the combination of V100s and GGML_CUDA_FORCE_MMQ seems to be wrong on master. By default, when compiling without GGML_CUDA_FORCE_MMQ, the MMQ kernels should onl...
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: YES New versions (After 0.1.31) ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no I tried to force this via env variables, but it did not help. Is there way to configure this via OLLAMA OS Linux ...
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (ggml-org#… … 8591377 arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024 Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (ggml-org#… … b1776ff MagnusS0 pushed a commi...
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.