attention re-use in lookup vit should use pre-softmax...
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch - attention re-use in lookup vit should use pre-softmax attention matrix · Huida13/vit-pytorch@9992a61