Recently, zero-shot cross-modal hashing has gained significant popularity due to its ability to effectively realize the retrieval of emerging concepts within multimedia data. Although the existing approaches have shown impressive results, the following limitations still need to be solved: (1) Labels ...
Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval论文阅读笔记,程序员大本营,技术文章内容聚合第一站。
Zero-Shot Learning of Class Semantics via Temporal Attention 论文链接: [https://arxiv.org/abs/1809.00116] 概述: 这篇论文研究了如何利用视频中的动态信息进行ZSL。作者提出了一个基于时间注意力机制的模型,可以学习到类别的语义信息。 Learning Semantic Models for Cross-Modal Zero-Shot Sketch Data Retrieval...
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a challenging cross-modal retrieval task. In prior arts, the retrieval is conducted by sorting the distance between the query sketch and each image in the gallery. However, the domain gap and the zero-shot setting make neural networks hard...
Paper tables with annotated results for Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR). Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations...
Learning Semantic Models for Cross-Modal Zero-Shot Sketch Data Retrieval 论文链接: [https://www.sciencedirect.com/science/article/abs/pii/S0031320318303701] 概述: 这篇论文研究了如何进行跨模态的零次学习,特别是在草图数据检索的任务中。 以上这些论文只是零次学习领域的冰山一角,具体选择哪篇论文取决于你...
CLIP(Contrastive Language-Image Pre-training)模型能够用于zero-shot分类的原因在于其独特的训练方式和架构。以下是详细解释: 大规模数据集: CLIP模型通常在包含数十亿图像-文本对的大规模数据集上进行训练。这使得模型能够学习到丰富的视觉和语言信息,从而在面对未见过的类别时,也能够识别出它们的基本特征。 多模态...
Many approaches in generalized zero-shot learning (GZSL) rely on cross-modal mapping between the image feature space and the class embedding space, which achieves knowledge transfer from seen to unseen classes. However, these two spaces are completely different space and their manifolds are inconsiste...
Zero-shot learning through cross-modal transfer[J]. Advances in neural information processing systems, 2013, 26. 1. 整体概要 这是多模态的一篇早年的paper,整体的工作就是来做图像识别,但其它的方式只是识别训练过的图像类别。一个例子就是区别是狗还是猫,当训练完模型后,再给一张狗的图像,就算是这个图像...