^[3] Bottom-up and top-down attention for image captioning and visual question answeringhttps://arxiv.org/pdf/1707.07998.pdf ^[4] VL-BERT: PRE-TRAINING OF GENERIC VISUALLINGUISTIC REPRESENTATIONShttps://arxiv.org/pdf/1908.08530v4.pdf ^[5] Vilbert: Pretraining task-agnostic visiolin-guistic...
VL-BERT: PRE-TRAINING OF GENERIC VISUAL-LINGUISTIC REPRESENTATIONS论文笔记 Arthur Wong Love AI & Life1 人赞同了该文章 本文引入了一种新的可预训练的视觉语言任务通用表示,称为视觉语言BERT(简称VL-BERT)。VL-BERT采用了简单但功能强大的Transformer模型作为骨干,并对其进行了扩展,将视觉和语言嵌入特性都作为...
linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and...
背景 这是微软亚研院的工作,将纯文本设定的bert扩展到visual-linguistic场景,从预训练到fine-tune,可以用于多个下游任务。 摘要 作者提出了一个可预训练的用于学习视觉-语言任务通用表征的模型VL-BERT,VL-BERT以transformers为主干,可以同时接受V、L特征作为输入。预训练任务使用了包括visual-language数据集Conceptual Cap...
VL-BERT: PRE-TRAINING OF GENERIC VISUALLINGUISTIC REPRESENTATIONS VL-BERT: PRE-TRAINING OF GENERIC VISUALLINGUISTIC REPRESENTATIONS 2022-03-30 20:35:13 Paper:https://openreview.net/forum?id=SygXPaEYvH Code:https://github.com/jackroos/VL-BERT...
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning Antoine Yang†* Arsha Nagrani§ Paul Hongsuck Seo§ Antoine Miech♯ Jordi Pont-Tuset§ Ivan Laptev† Josef Sivic¶ Cordelia Schmid§ §Google Research †Inria Paris ...
VL-BERT: Pre-training of Generic Visual-Linguistic Representations VL-BERT:通用视觉-语言表征预训练 论文地址:https://arxiv.org/abs/1908.08530 论文摘要:作者们设计了一种新的用于视觉-语言任务的可预训练的通用表征,名为 VL-BERT。VL-BERT 把简单有效的 Transformer 模型作为主干并进行拓展,视觉和语言嵌入特...
Li, G., Duan, N., Fang, Y., Jiang, D., Zhou, M.: Unicoder-VL: a universal encoder for vision and language by cross-modal pre-training, arXiv preprint arXiv:1908.06066 (2019) Su, W., et al.: VL-BERT: pre-training of generic visual-linguistic representations, arXiv preprint ar...
(2019). ViLBERT: pretraining task-agnostic vision linguistic representations for vision-and-language tasks. In Proceedings of the 33rd international conference on neural information processing systems (pp. 1–11). Red Hook: Curran Associates. Google Scholar Chen, Y.-C., Li, L., Yu, L.,...
VL-BERT: Pre-training of Generic Visual-Linguistic Representations VL-BERT:通用视觉-语言表征预训练 论文地址:https://arxiv.org/abs/1908.08530 论文摘要:作者们设计了一种新的用于视觉-语言任务的可预训练的通用表征,名为 VL-BERT。VL-BERT 把简单有...