torch.bmm fails when multiplying a batch of sparse matrices...
🐛 Describe the bug When I try to multiply a batch of sparse matrices of shape (B, N, N) by a dense tensor of shape (B, N, 1) using torch.bmm, the following error is thrown: RuntimeError: CUDA error: misaligned address However, no error i...