set(GGML_CUDA_GRAPHS_DEFAULT ON) endif() 2 changes: 1 addition & 1 deletion 2 Package.swift Original file line numberDiff line numberDiff line change @@ -88,5 +88,5 @@ let package = Package( linkerSettings: linkerSettings ) ], cxxLanguageStandard: .cxx11 cxxLanguageStandard: .cxx17...
This PR adds CUDA FlashAttention kernels that do not use tensor cores and are optimized for large batch sizes. On my P40 enabling FlashAttention is now consistently faster: model backend ngl n_b...
cmake -B build \ -DGGML_CUDA=ON \ -DGGML_VULKAN=1 \ -DCMAKE_INSTALL_PREFIX='/usr/local' \ -DGGML_ALL_WARNINGS=OFF \ -DGGML_ALL_WARNINGS_3RD_PARTY=OFF \ -DBUILD_SHARED_LIBS=ON \ -DGGML_STATIC=OFF \ -DGGML_LTO=ON \ -DGGML_RPC=ON \ -DLLAMA_CURL=ON \ -DGGML_CUDA=ON...
Member slaren commented Feb 11, 2025 If we set const int min_batch_size = 999999; in ggml_backend_cuda_device_offload_op, can we also use -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=... for OpenBLAS or MKL, or will it not allow this at the same time as -DLLAMA_CUDA=ON? Yes, but ...
I don't have any crashes on CUDA, so selecting CUDA instead of Vulkan at runtime would prevent crashing in Vulkan with Nvidia. It wouldn't actually fix Vulkan crashing. It would just be a workaround. Member slarencommentedDec 2, 2024 ...
ggml_add_cpu_backend_variant(sapphirerapids AVX F16C AVX2 FMA AVX512 AVX512_VBMI AVX512_VNNI AVX512_BF16 AMX_TILE AMX_INT8) endif() else () ggml_add_cpu_backend_variant_impl("") endif() ggml_add_backend(BLAS) ggml_add_backend(CANN) ggml_add_backend(CUDA) 32 changes: 21 addit...
Cannot build with VS 2022 (admin dev prompt) for CUDA anymore and I think it is this change, I have all the permissions and dir/subdirs have full control permissions to all users. Was compiling up until right after Daisyui server revamp ~2 weeks ago. The dll export fails (...) ggml...
cuda/*.o rm -vrf ggml/src/ggml-cuda/template-instances/*.o rm -rvf libllava.a llama-baby-llama llama-batched llama-batched-bench llama-bench llama-benchmark-matmult llama-cli llama-convert-llama2c-to-ggml llama-embedding llama-eval-callback llama-export-lora llama-finetune llama-gb...
nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\libnvvp\nvml.dll C:\Users\zrway\AppData\Local\Programs\Ollama\nvml.dll C:\Program Files\Common Files\Oracle\Java\javapath\nvml.dll C:\Windows\system32\nvml.dll C:\Windows\nvml.dll C:\Windows\System32\Wbem\nvml.dll C:\...
@@ -40,6 +43,14 @@ if [ ! -z ${GG_BUILD_CUDA} ]; then 40 43 CMAKE_EXTRA="${CMAKE_EXTRA} -DLLAMA_CUBLAS=1" 41 44 fi 42 45 46 + if [ ! -z ${GG_BUILD_SYCL} ]; then 47 + if [ -z ${ONEAPI_ROOT} ]; then 48 + echo "Not detected ONEAPI_ROOT, please ...