Now, BigBird proposes two ways of allowing long-term attention dependencies while staying computationally efficient.Global tokens: Introduce some tokens which will attend to every token and which are attended by every token. Eg: "HuggingFace is building nice libraries for easy NLP". Now, let's ...