3333 this_dir=os.path.dirname(os.path.abspath(__file__)) 3434 35- PACKAGE_NAME="flash_attn" 35+ PACKAGE_NAME="vllm_flash_attn" 3636 3737 BASE_WHEEL_URL=( 3838 "https://github.com/Dao-AILab/flash-attention/relea
Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
Your current environment As the title, vllm-flash-attn pins torch==2.4.0 but vllm 0.6.5 requires torch==2.5.1. How you are installing vllm $ uv pip install vllm==0.6.5 vllm-flash-attn × No solution found when resolving dependencies: ╰─▶ ...
Git 命令在线学习 如何在 Gitee 导入 GitHub 仓库 Git 仓库基础操作 企业版和社区版功能对比 SSH 公钥设置 如何处理代码冲突 仓库体积过大,如何减小? 如何找回被删除的仓库数据 Gitee 产品配额说明 GitHub仓库快速导入Gitee及同步更新 什么是 Release(发行版) 将PHP 项目自动发布到 packagist.org 评论...
这个问题在Spec解码测试中应该已经修复了。fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge...
GitHub Advanced Security Enterprise-grade security features Copilot for business Enterprise-grade AI features Premium Support Enterprise-grade 24/7 support Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Provide feedback We read every piece of feedback...
GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git GIT_TAG 5259c586c403a4e4d8bf69973c159b40cc346fb9 GIT_TAG d886f88165702b3c7e7744502772cd98b06be9e1 GIT_PROGRESS TRUE # Don't share the vllm-flash-attn build between build types BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash...
2 changes: 2 additions & 0 deletions 2 packages/llm/vllm/build.sh Original file line numberDiff line numberDiff line change @@ -29,6 +29,8 @@ git clone --recursive --depth=1 https://github.com/vllm-project/vllm /opt/vllm cd /opt/vllm # apply patches: Remove switching to ...
WoosukKwoncommentedMay 8, 2024 This PR is to use the pre-builtvllm-flash-attnwheel instead of the originalflash-attn. [Misc] Use vllm-flash-attn instead of flash-attn de121f5 WoosukKwonrequested a review fromLiuXiaoxuanPKUMay 8, 2024 16:01 ...
File tree vllm/attention/backends rocm_flash_attn.py 1 file changed +5 -0lines changed Diff for: vllm/attention/backends/rocm_flash_attn.py +5 Original file line numberDiff line numberDiff line change @@ -23,6 +23,11 @@ 23 23 _PARTITION_SIZE_ROCM = 512 24 24 _GPU_ARCH =...