作者单位:MIT CSAIL, Jame Glass组 Highlight:本文对多模态学习提出了一种补充的细颗粒度表征,也就是局部表征。对局部表征进行离散化,映射到一个多模态共享的空间中。离散化的表征的好处主要是可解释性更好。…
(ICCV'19 Oral) VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research (NeurIPS'20) COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning (ICCV'21) TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment (Arxiv'22) VideoCoC...
Cross-modal retrievalModality gapGenerative adversarial networkIn this paper, we proposed a semi-supervised common representation learning method with GAN-based Asymmetric Transfer Network (GATN) for cross modality retrieval. GATN utilizes the asymmetric pipeline to guarantee the semantic consistency and ...
Moreover, we propose the intra-modal self-similarity and inter-modal cross-consistency softened targets in the cross-modal representation learning process to... Y Chen,T He,J Fu,... 被引量: 0发表: 2024年 Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modalit...
{Video-text as game players: Hierarchical banzhaf interaction for cross-modal representation learning}, author={Jin, Peng and Huang, Jinfa and Xiong, Pengfei and Tian, Shangxuan and Liu, Chang and Ji, Xiangyang and Yuan, Li and Chen, Jie}, booktitle={Proceedings of the IEEE/CVF Conference...
representation-learning 3d-point-clouds self-supervised-learning cross-modal-learning Updated Jul 1, 2024 Python mako443 / Text2Pos-CVPR2022 Star 40 Code Issues Pull requests Code, dataset and models for our CVPR 2022 publication "Text2Pos" nlp computer-vision localization deep-learning...
第一阶段预训练主要包括4个任务: self-supervised masked language modeling、两个lexicon-bottlenecked masked language modelings、in-batch lexicon-contrastive learning。第一阶段预训练的整体结构图如下。 Self-supervised masked language modeling:基础的MLM任务,mask掉一部分token后对这部分token进行预测,主要是训练文...
The inter-correlation is important to supplement the learning of the representation for each other modalities. We use the cross-attention to capture the cross-modal correlation for representation learning. As shown in Figure 2, the input of the cross-attention are the stacked features of image ...
Zero-Shot Learning 零样本学习是迁移学习的一种特殊场景;在零样本学习过程中,训练类集和测试类集之间没有交集,需要通过训练类与测试类之间的知识迁移来完成学习, 使在训练类上训练得到的模型能够成功识别测试类输入样例的类标签。更一般来说,如果模型在训练过程中,只使用训练类的样本进行训练,而在测试阶段可以识别从...
在研究进展上,Matching Function Learning 这一分支会很关注细粒度的 attention和交叉特征。本文只讨论 Representation Learning 这一分支,上图可以看出这一脉络的研究工作还是很多的(不局限于双塔跨模态领域),研究者们都想找到一个方法能够将信号 embedding 化,并且这个 embedding 能完美的『代表』这个信号本身,而这些 ...