A high-throughput and memory-efficient inference and serving engine for LLMs - [Kernel] Update vllm-flash-attn version (#10736) · vllm-project/vllm@9a8bff0
2 changes: 1 addition & 1 deletion2vllm_flash_attn/__init__.py Original file line numberDiff line numberDiff line change @@ -1,6 +1,6 @@ __version__="2.5.6" fromflash_attn.flash_attn_interfaceimport( fromvllm_flash_attn.flash_attn_interfaceimport( ...
目前最好用的iPhone LLM软件更新 | 最近更新很大,终于有模有样了,增加了一些参数设置,包括Flash attn, batch, ubatch, 和 cache_type_kv 都能调节, 4个小时前更新,新鲜热辣,配合刚发布基于qwen2.5 3b的smallthinker,应该是很棒的,我的iPhone能跑(不会太耗电)#LLM(大型语言模型) 现在唯一剩下的,就是到底能...
RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist \ --mount=type=cache,target=/root/.cache/pip \ pip install dist/*.whl --verbose RUN --mount=type=bind,from=flash-attn-builder,src=/usr/src/flash-attention-v2,target=/usr/src/flash-attention-v2 \ --mo...
0..._attn/flash_blocksparse_attn_interface.py → ..._attn/flash_blocksparse_attn_interface.py File renamed without changes. 0flash_attn/fused_softmax.py → vllm_flash_attn/fused_softmax.py File renamed without changes. 0flash_attn/layers/__init__.py → vllm_flash_attn/layers/__init_...
+ PATCH_COMMAND ${patch_vllm_flash_attn} + UPDATE_DISCONNECTED 1 ) else() FetchContent_Declare( @@ -585,6 +589,8 @@ else() GIT_PROGRESS TRUE # Don't share the vllm-flash-attn build between build types BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn + PATCH_COMMAND ${patch_vll...
vllm-flash-attn is compiled together with vllm instead of separately. So you do not need to install vllm-flash-attn separately. Obviously, the subsequent cmake files in the vllm-flash-attn warehouse were not updated along with vllm, ...
vllm/vllm_flash_attn/ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] @@ -12,6 +15,8 @@ __pycache__/ # Distribution / packaging .Python build/ cmake-build-*/ CMakeUserPresets.json develop-eggs/ dist/ downloads/ 98 changes: 73 additions & 25 deletions 98 CMa...
Enable when vllm_flash_attn da50678 Merge branch 'main' into flash-attention-decode 6d5b4ec Add vllm-flash-attn as dependency 37cb5a9 WoosukKwonadded2commitsMay 13, 2024 17:54 yapf 1be2eb3 Use fp32 in ref attn softmax d544611 ...
This PR is to use the pre-built vllm-flash-attn wheel instead of the original flash-attn. [Misc] Use vllm-flash-attn instead of flash-attn de121f5 WoosukKwon requested a review from LiuXiaoxuanPKU May 8, 2024 16:01 LiuXiaoxuanPKU approved these changes May 8, 2024 View reviewed ...