Pretrained language models can generate contextualized embeddings for language input that enhances modeling of the language data (Mikolov et al.,2013; Sherin,2013; Taher Pilehvar & Camacho-Collados,2020). Essentially, words are mapped to a position in high-dimensional vector space, called a distrib...