tricks+for+training+sparse+translation+models

2025-02-26 20:22:45

拼音 [ 拼音 ]

NeurIPS 2020 | Teaching Transformers New Tricks | Synced

the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current methods for accelerating the pre-training either rely on massive parallelism with advanced hardware or are not applicable to language models.In this work,...