WavLM: speech pre-training for full stack tasks VALL-E: a neural codec language model for TTS LayoutLM/LayoutLMv2/LayoutLMv3: multimodal (text + layout/format + image)Document Foundation ModelforDocument AI(e.g. scanned documents, PDF, etc.) ...