ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no llm_load_tensors: ggml ctx size = 0.15 MiB llm_load_tensors: offloading 0 repeating layer...
const auto & cuda_info = ggml_cuda_info(); std::unordered_map<std::string, size_t> count_by_name; for (int dev_idx = 0; dev_idx < cuda_info.device_count; dev_idx++) { const auto & device = cuda_info.devices[dev_idx]; std::string name(device.name); size_t n_idx = ++...
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 3 CUDA devices: Device 0: GRID A100D-16C, compute capability 8.0, VMM: no Device 1: GRID A100D-16C, compute capability 8.0, VMM: no Device 2: GRID A100D-16C, compute ca...
Follow-up to #8151, which added ternary types (although CPU-only at first), this implements CUDA kernels for TQ2_0 (mmvq, tile loading for mmq and mma, and dequant-based cuBLAS). (Although there wa...
2 changes: 1 addition & 1 deletion 2 ggml/src/ggml-cuda/convert.cu Original file line numberDiff line numberDiff line change @@ -287,7 +287,7 @@ static __global__ void dequantize_block_tq2_0(const void * __restrict__ vx, dst_ const int64_t n = tid/32; // 0 or 1 const...
7 changes: 7 additions & 0 deletions 7 ggml/src/ggml-cuda/common.cuh Original file line numberDiff line numberDiff line change @@ -440,6 +440,13 @@ struct ggml_cuda_type_traits<GGML_TYPE_Q6_K> { static constexpr int qi = QI6_K; }; template<> struct ggml_cuda_type_traits...
Sometimes it hits GGML_ASSERT: ggml-cuda.cu:7759: ggml_is_contiguous(src0) for me. I cannot debug since I run this on runpod, but seed is included in run command, so it should be easy to reproduce $ ./main -m models/stabilityai-stablelm-3b-4e1t-Q8_0.gguf -p "The best mus...
ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1 fbddb26 Pull Request Labeler on: pull_request_target 1 labeler Python Type-Check on: pull_request 1 pyright type-check Server on: pull_request 3 server (ADDRESS, RelWithDebInfo) server (UNDEFINED, Rel...
Steps did to fix the error ggml-cuda requires the language dialect "CUDA17" While using Ubuntu 20.04 ...5.15.0-1074-oracle. Installed older cuda + display driver version 535. uninstall nvidia completely & uninstall cuda. Restart. downlaod the 4GB cuda + driver file [](wget https://dev...