Language Modeling. Specifically, VaLM builds on a novel latent text-image alignment method via an image retrieval module to fetch corresponding images given a textual context. With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling ...
Official implementation of our paper "Visually-Augmented Language Modeling". Please cite our paper if you find this repository helpful in your research: @article{valm, title={Visually-augmented language modeling}, author={Wang, Weizhi and Dong, Li and Cheng, Hao and Song, Haoyu and Liu, Xiaodo...