A Survey of Large Language Models in Medicine: Progress, Application, and ChallengearXiv09 Nov 2023Paper Multi-modal Pre-trainingA Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPTarXiv18 Feb 2023Paper The Contribution of Knowledge in Visiolinguistic Learning: A Su...
Finally, we analyze the datasets and metrics used in previous works as well as their reported results. Our survey allows to uncover the properties of each approach and discuss future research directions in this field.Similar content being viewed by others Multi-modal traffic event detection using ...
Yang, “End-to-end multi-modal video temporal grounding,” NeurIPS, 2021. [258] S. Chen and B. Li, “Multi-modal dynamic graph transformer for visual grounding,” in CVPR, 2022. [259] A. Yang, A. Miech, J. Sivic, I. Laptev, and C. Schmid, “Tubedetr: Spatio-temporal video...
Cross-modal similarity based methods aim to learn a common subspace where the distance of vectors from different modalities can be measured directly [75], while cross-modal correlation based methods aim to learn a shared subspace such that the correlation of the representation sets from different mo...
In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot ...
Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers. arXiv 2021 paper bib Mika Hämäläinen, Khalid Al-Najjar Keyphrase Generation: A Multi-Aspect Survey. FRUCT 2019 paper bib Erion Çano, Ondrej Bojar Neural Language Generation: Formulation, Methods, and ...
多模态摘要(Multi-modal Summarization)是指输入多种模态信息,通常包括文本,语音,图像,视频等信息,输出一段综合考虑多种模态信息后的核心概括。目前的摘要研究通常以文本为处理对象,一般不涉及其他模态信息的处理。然而,不同模态的信息是相互补充和验证的,充分有效的利用不同模态的信息可以帮助模型更好的定位关键内容,生...
Early works in this domain mainly focus on static KGR, and recent works try to leverage the temporal and multi-modal information, which are more practical and closer to real-world. However, no survey papers and open-source repositories comprehensively summarize and discuss models in this ...
Though sports video summarization has been an active research topic for some time; there still exists a void for multi-modal, dynamic, generic and domain knowledge based approach for Cricket Sport video summarization. This paper presents a multi-modal video summarization approach to summarize Cricket...
MIMIC-IT MIMIC-IT: Multi-Modal In-Context Instruction Tuning Coming soon Multimodal in-context instruction tuning M3IT M3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning Link Large-scale, broad-coverage multimodal instruction tuning dataset LLaVA-Med LLaVA-Med: Training ...