By accepting optional cookies, you consent to the processing of your personal data - including transfers to third parties. Some third parties are outside of the European Economic Area, with varying standards of data protection. See our privacy policy for more information on the use of your perso...
Wang, Y.: Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)17(1s), 1–25 (2021) Jin, W., Zhao, Z., Zhang, P., Zhu, J., He, X., Zhuang, Y.: Hierarchical cross-modal graph consistency learning for...
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey. arXiv 2023 paper bib Xiao Wang, Guangyao Chen, Guangwu Qian, Pengcheng Gao, Xiao-Yong Wei, Yaowei Wang, Yonghong Tian, Wen Gao On Efficient Training of Large-Scale Deep Learning Models: A Literature Review. arXiv 2023 paper...
As the rapid development of deep neural networks, multi-modal learning techniques are widely concerned. Cross-modal retrieval is an important branch of multimodal learning. Its fundamental purpose is to reveal the relation between different modal samples
Deep multi-modal learning architectures capable of handling four and five modalities have also been reported [34], [35], [36]. Their goal is to effectively capture patterns from temporal data (video, co-motion, audio) and explore spatio-temporal relationships. Existing reviews of deep learning-...
multi-modal information such as POI attributes and friendship network, to recommend the next set of POIs suitable for a user. A plethora of earlier works focus on traditionalmachine learning techniquesthat use hand-crafted features from the dataset. With the recent surge ofdeep learningresearch, ...
自动驾驶车辆自主行驶在道路上,需要对周围三维场景进行感知,而三维目标检测用于获取物体在三维空间中的位置和类别信息,是自动驾驶感知系统的基础,对后续的路径规划、运动预测、碰撞避免具有重要指导作用。三维…
In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot ...
In essence, it is a method of learning complex feature representation based on original feature input through multi-layer nonlinear processing. If combined with specific domain tasks, DL can construct new classifiers or generating tools through the feature representation of automatic learning and realize...
on the downstream tasks. These models are advantageous in that they can learn deep context-aware word representations from large unannotated text corpora—large-scale self-supervised pre-training. This is especially useful when learning a domain-specific language with insufficient available labelled data...