A high-throughput and memory-efficient inference and serving engine for LLMs - [Kernel] Update vllm-flash-attn version (#10736) · vllm-project/vllm@9a8bff0
2 changes: 1 addition & 1 deletion2vllm_flash_attn/__init__.py Original file line numberDiff line numberDiff line change @@ -1,6 +1,6 @@ __version__="2.5.6" fromflash_attn.flash_attn_interfaceimport( fromvllm_flash_attn.flash_attn_interfaceimport( ...
#LLM(大型语言模型) 现在唯一剩下的,就是到底能不能跑MOE模型。当然还可以加基本的RAG功能也很棒。应用名称: PocketPal AI应用图标: 黄色背景,中间有一个黑色线条,看起来像一个简化的笑脸,在一个对话气泡中。版本号: v1.6.2更新时间: 4小时前应用类别: 效率...
RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist \ --mount=type=cache,target=/root/.cache/pip \ pip install dist/*.whl --verbose RUN --mount=type=bind,from=flash-attn-builder,src=/usr/src/flash-attention-v2,target=/usr/src/flash-attention-v2 \ --mo...
RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist \ --mount=type=cache,target=/root/.cache/pip \ pip install dist/*.whl --verbose RUN --mount=type=bind,from=flash-attn-builder,src=/usr/src/flash-attention-v2,target=/usr/src/flash-attention-v2 \ --mo...
0..._attn/flash_blocksparse_attn_interface.py → ..._attn/flash_blocksparse_attn_interface.py File renamed without changes. 0flash_attn/fused_softmax.py → vllm_flash_attn/fused_softmax.py File renamed without changes. 0flash_attn/layers/__init__.py → vllm_flash_attn/layers/__init_...
vllm-flash-attn is compiled together with vllm instead of separately. So you do not need to install vllm-flash-attn separately. Obviously, the subsequent cmake files in the vllm-flash-attn warehouse were not updated along with vllm, ...
if(VLLM_GPU_LANG STREQUAL "CUDA" OR VLLM_GPU_LANG STREQUAL "HIP") message(STATUS "Enabling C extension.") add_dependencies(default _C) # # Build vLLM flash attention from source # # IMPORTANT: This has to be the last thing we do, because vllm-flash-attn uses the same macros...
vllm/vllm_flash_attn/ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] @@ -12,6 +15,8 @@ __pycache__/ # Distribution / packaging .Python build/ cmake-build-*/ CMakeUserPresets.json develop-eggs/ dist/ downloads/ 98 changes: 73 additions & 25 deletions 98 CMa...
Enable when vllm_flash_attn da50678 Merge branch 'main' into flash-attention-decode 6d5b4ec Add vllm-flash-attn as dependency 37cb5a9 WoosukKwonadded2commitsMay 13, 2024 17:54 yapf 1be2eb3 Use fp32 in ref attn softmax d544611 ...