Flash Attention不可用时的处理建议 当系统提示“warning: flash attention is not available, use_flash_attn is set to false”时,意味着当前环境或配置不支持使用Flash Attention技术。针对这种情况,以下是一些建议和处理步骤: 检查环境配置: 确认你的GPU型号是否支持Flash Attention。根据搜索结果,Flash Attention需要...
# TODO: add support for attn.scale when we move to Torch 2.1 hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False) inner_dim = key.shape[-1] head_dim = inner_dim // attn.heads hidden_states = hidden_states.transpose(1, 2).reshape(batc...