Processing large structures in speech and other complex sound sequences is a challenging task, since individual sound elements come rapidly (i.e., ~4–5 syllables per second for speech21and ~1–4 beats per second for music22) and long-distance dependency exists across seconds23,24. The dual...