...Add cross-attention layers for Encoder-Decoder setting...
Since Reformer also does not yet have a massively pretrained bi-directional encoder-only model, the focus will most likely shift to the Longformer encoder-decoder framework afterwards. Longformer has more tract