针对你遇到的问题“cannot use flashattention-2 backend because the flash_attn package is not found”,我将根据提供的提示进行逐一分析和解答: 确认flash_attn包是否已经正确安装: 首先,你需要检查flash_attn包是否已经安装在你的环境中。可以通过运行以下命令来检查: bash pip show flash_attn 如果系统提示未...
parser.add_argument('--use_fast',action='store_true',help='Set use_fast=True while loading the tokenizer.') parser.add_argument('--use_flash_attention_2',action='store_true',help='Set use_flash_attention_2=True while loading the model.') ...
Your current environment Driver Version: 545.23.08 CUDA Version: 12.3 python3.9 vllm 0.4.2 flash_attn 2.4.2~2.5.8 (I have tried various versions of flash_attn) torch 2.3 🐛 Describe the bug Cannot use FlashAttention-2 backend because the ...
Reminder I have read the README and searched the existing issues. Reproduction deepspeed --include localhost:0,1,2,3 --master_port 29504 src/train_bash.py \ --stage sft \ --use_unsloth \ --model_name_or_path CodeLlama-13b-Instruct-hf/ \ ...
My attention_mask is a dynamic mask matrix for the prefix decoder, similar to UniLM and GLM. How should this type of attention_mask be applied to Flash Attention? 👀 2 Contributor tridao commented Apr 18, 2024 That kind of mask is not currently supported....