Unlike the traditional tactile Internet which mainly focuses on haptic, this work proposes a collaborative communications mechanism by exploring the temporal, spatial, and semantic relevance of cross-modal signals. On one hand, we design a content-driven scheduling scheme to guarantee high-quality ...
来自 Semantic Scholar 喜欢 0 阅读量: 184 作者:M Kim,G Kim,SW Lee,JW Ha 摘要: Language model pre-training has shown promising results in various downstream tasks. In this context, we introduce a cross-modal pre-trained language model, called Speech-Text BERT (ST-BERT), to tackle end-to...
To this end, we decompose the model into a series of memory-based reasoning steps, each performed by a Graph-based Read, Update, and Control (GRUC) module that conducts parallel reasoning over both visual and semantic information. By stacking the modules multiple times, our model performs ...
Among them, the real value representation-based method is adopted to improve the semantic relevance, and improve the accuracy, and the binary representation-based learning method is used to improve the efficiency of image-text cross-modal retrieval and reduce storage space. In addition, the common...
Coupled CycleGAN Unsupervised Hashing Network for Cross-Modal Retrieval Coupled CycleGAN:用于跨模态检索的无监督哈希网络 Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval 用于无监督跨模态检索的深度语义对齐散列 跨模态检索 Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive ...
In this paper, we tackle the problem of RGB-D Semantic Segmentation. The key challenges in solving this problem lie in 1) how to extract features from depth sensor data and 2) how to effectively fuse the features extracted from the two modalities. For th
Visual semantic embedding network or cross-modal cross-attention network are usually adopted for image-text retrieval. Existing works have confirmed that both visual semantic embedding network and cross-modal cross-attention network can ... Z Zeng,J Cao,G Jiang,... - 《Proceedings of International...
Y Peng,J Qi,Y Yuan - 《Acm Transactions on Multimedia Computing Communications & Applications》 被引量: 8发表: 2017年 Deep Semantic Mapping for Cross-Modal Retrieval Cross-Modal mapping plays an essential role in multimedia information retrieval systems. However, most of existing work paid much ...
We argue that cross-modal retrieval may help bridge the semantic gap between an entity and its depictions, and is foremost complementary with mono-modal retrieval. We provide empirical evidence through experiments with a multimodal dual encoder, namely CLIP, on the recent ViQuAE, InfoSeek, and ...
However, these strategies introduce redundancy in the features of different modalities without fully considering the complementary properties between modal information, and these approaches do not guarantee the non-loss of original semantic information during intra- and inter-modal interactions. In this ...