add_zero_attn

2025-01-27 07:37:08

拼音 [ 拼音 ]

add_zero_attn in MultiheadAttention breaks causality · Issue...

🐛 Bug add_zero_attn=True in MultiheadAttention is ignoring the mask during the backward() To Reproduce Steps to reproduce the behavior: import torch import numpy as np embedding_dim = 8 batch_size = 1 num_heads = 2 seq_len = 4 net = torc...