ggml+cuda+force+mmq

2025-05-05 02:24:32

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098) · gg...

__CUDA_ARCH__ >= GGML_CUDA_CC_VOLTA #ifdef GGML_CUDA_FORCE_MMQ return MMQ_DP4A_MAX_BATCH_SIZE; #else // GGML_CUDA_FORCE_MMQ return 128; #else // GGML_CUDA_FORCE_MMQ return MMQ_DP4A_MAX_BATCH_SIZE; #endif // GGML_CUDA_FORCE_MMQ#...
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098) · gg...

Matrix: windows-2019-cmake-cuda 0/2 jobs completed Show all jobs Matrix: windows-latest-cmake-hip-release 0/3 jobs completed Show all jobs Matrix: windows-latest-cmake 0/10 jobs completed Show all jobs macOS-latest-cmake-arm64 macOS-latest-cmake-x64 2m 56s ubuntu-22-cmake-...
ggml-vulkan.cpp · tmfll/whisper.cpp - Gitee.com

std::initializer_list<uint32_t> warptile_mmq_l = { 128, 128, 128, 32, device->subgroup_size * 2, 64, 2, 4, 4, device->subgroup_...// Emulate behavior of CUDA_VISIBLE_DEVICES for Vulkan char * devices_env = getenv("GGML_VK_VISIBLE_DEVICES"); ...
ggml-vulkan.cpp · magic/koboldcpp - Gitee.com

std::initializer_list<uint32_t> warptile_mmq_l = { 128, 128, 128, 32, device->subgroup_size * 2, 64, 2, 4, 4, device->subgroup_...// Emulate behavior of CUDA_VISIBLE_DEVICES for Vulkan char * devices_env = getenv("GGML_VK_VISIBLE_DEVICES"); ...
...performance regression on 0.1.32 -> GGML_CUDA_FORCE_MMQ...

What is the issue? For reference: #3938 The issue might be actually result of disabling the following mode: Older versions: 0.1.31 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: YES New versions (After 0.1.31) ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ...
...stack usage during tool calling · Issue #12234 · ggml...

Name and Version ./llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: ye...
ggml-vulkan.cpp · jun8086/whisper.cpp - Gitee.com

std::initializer_list<uint32_t> warptile_mmq_l = { 128, 128, 128, 32, device->subgroup_size * 2, 64, 2, 4, 4, device->subgroup_...// Emulate behavior of CUDA_VISIBLE_DEVICES for Vulkan char * devices_env = getenv("GGML_VK_VISIBLE_DEVICES"); ...
cuda : improve multi-GPU performance using cuBLAS by ggergan...

ggml-cuda.cu #ifdefGGML_CUDA_FORCE_MMQ #defineMUL_MAT_SRC1_COL_STRIDE128 #else //with tensor cores, we copy the entire hidden state to the devices in one go #defineMUL_MAT_SRC1_COL_STRIDE ggerganovcommentedOct 28, 2023 The reason to do it like this is because on the main device...
CUDA: refactor and optimize IQ MMVQ by JohannesGaessler...

This PR refactors and optimizes the IQ MMVQ CUDA code. Notably as part of these changes I'm changing some values in ggml-common.h. The "qr" values are meant to represent how many low bit data value...
...reduced context length · Issue #12251 · ggml-org/llama.cpp

Name and Version $./llama-server ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes version: 0 (unk...

快搜汉语词典

ggml+cuda+force+mmq

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098) · gg...

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098) · gg...

ggml-vulkan.cpp · tmfll/whisper.cpp - Gitee.com

ggml-vulkan.cpp · magic/koboldcpp - Gitee.com

...performance regression on 0.1.32 -> GGML_CUDA_FORCE_MMQ...

...stack usage during tool calling · Issue #12234 · ggml...

ggml-vulkan.cpp · jun8086/whisper.cpp - Gitee.com

cuda : improve multi-GPU performance using cuBLAS by ggergan...

CUDA: refactor and optimize IQ MMVQ by JohannesGaessler...

...reduced context length · Issue #12251 · ggml-org/llama.cpp

缩写

今日热搜

上海网友集中晒蘑菇

近反义词

相关词语

相关搜索