ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes build: 4794 (06c2b1...
In CUDA: use MMQ instead of cuBLAS by default #8075, MMQ was enabled by default on GPUs with int8 tensor core support. A short description of the LLAMA_CUDA_FORCE_MMQ was added to the README. As it currently stands though, the message makes it seem like MMQ will not be used unless...
可从以下关键启动日志看出,模型在GPU上执行 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES:yesggml_cuda_init: found 1 CUDA devices: Device 0: Tesla V100S-PCIE-32GB, compute capability 7.0, VMM:yesllm_load_tensors: ggml ctx size = 0.30 MiB llm_load_t...
Matrix: windows-2019-cmake-cuda 0/2 jobs completed Show all jobs Matrix: windows-latest-cmake-hip-release 0/3 jobs completed Show all jobs Matrix: windows-latest-cmake 0/10 jobs completed Show all jobs macOS-latest-cmake-arm64 macOS-latest-cmake-x64 2m 56s ubuntu-22-cmake-...
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: noggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: noggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes| model | size | params | backend | ngl | test | t/s || --- | ---: | ---: | ...
./llama-cli -m models/llama-2-7b-chat/llama-2-7B-chat-F32.gguf -p "I believe the meaning of life is" -n 512 --n-gpu-layers 100 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 4 CUDA devices: Device 0: Tesla ...
llm_load_print_meta: max token length=256ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found1CUDA devices: Device0: NVIDIA GeForce RTX4090, compute capability8.9, VMM:yesllm_load_tensors: ggml ctx size=0.14MiB ...
[INFO] Socket address: 0.0.0.0:8080 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 2060, compute capability 7.5, VMM: yes [INFO] Wasi-nn-ggml plugin: b2636 (commit 5dc9dd...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 6.1, VMM: yes llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llam...
_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: Tesla V100-SXM2-32GB, compute capability 7.0, VMM: yes load_backend: loaded CUDA backend from /home/aistudio/ollama/lib/ollama/cuda_v11/libggml-cuda...