In CUDA: use MMQ instead of cuBLAS by default #8075, MMQ was enabled by default on GPUs with int8 tensor core support. A short description of the LLAMA_CUDA_FORCE_MMQ was added to the README. As it currently stands though, the message makes it seem like MMQ will not be used unless...
opened #12098 JohannesGaessler:cuda-fix-v100-force-mmq Status Queued Total duration – Artifacts – build.yml on: pull_request Matrix: ubuntu-cpu-cmake 2 jobs completed Show all jobs Matrix: windows-2019-cmake-cuda 0/2 jobs completed Show all jobs Matrix: windows-latest-cmake-hip-re...
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no I tried to force this via env variables, but it did not help. Is there way to configure this via OLLAMA OS Linux GPU Nvidia CPU Intel Ollama version 0.1.31 -> 0.3.10 jsa2added thebugSomething isn't workinglabelSep 16, 2024 ...
Fixes LostRuins#1390 . The logic for the combination of V100s and GGML_CUDA_FORCE_MMQ seems to be wrong on master. By default, when compiling without GGML_CUDA_FORCE_MMQ, the MMQ kernels should onl...
Cache not found for keys: ccache-macOS-latest-cmake-ios- macOS-latest-swift (generic/platform=iOS) Cache not found for keys: ccache-macOS-latest-swift- Artifacts Produced during runtime NameSize llama-bin-macos-x64.zip 24.9 MB llama-bin-ubuntu-x64.zip 26.8 MB ...