MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification Paper:https://openreview.net/pdf?id=Xj5J38B4oi 79.Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identi...
同时该数据因在多个维度上的规模提升,对于包括统一质检模型在内的多个工业异常检测技术研究方向均有助力。 Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer 作者:赵震(华东师范大学),唐景群(字节跳动),张志忠(华东师范大学),谭鑫(...
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation 论文地址: https://arxiv.org/abs/2311.17911 代码地址: https://github.com/shikiw/OPERA 1、背景 从LLaVA 到 Qwen-VL,从 GPT...
论文标题EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI 具身扫描:面向具身人工智能的整体多模态 3D 感知套件 论文链接: EmbodiedAI - 论文原文论文作者Tai Wang, Xiaohan Mao,…
这篇文章提出了一种新颖的生成式多模态模型(Generative Multi-modal Models, GMM)框架,用于处理类别增量学习(Class-Incremental Learning, CIL)中的灾难性遗忘问题。在CIL场景中,模型需要识别新类别的同时保留对之前类别的记忆,但现有的判别式模型往往会偏向于当前任务,导致对旧知识的遗忘。
论文标题 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI 具身扫描:面向具身人工智能的整体多模态 3D 感知套件 论文链接:https://volctracer.com/w/TsT1vBdQ 论文作者 Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, ...
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation 该论文针对多模态LLM的幻觉问题,提出了过度信任惩罚和回顾分配机制。项目代码:https://github.com/shikiw/OPERA Making Large Multimodal Models Understand Arbitrary Visual Prompts ...
[1] Shenghai Yuan, Yizhuo Yang, Thien Hoang Nguyen, Thien-Minh Nguyen, Jianfei Yang, Fen Liu, Jianping Li, Han Wang, Lihua Xie. MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats. In 2023 IEEE International Conference on Robotics and Automation (ICRA). ...
论文题目:ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis 来源:CVPR 2024 论文链接:https://arxiv.org/abs/2403.17936 内容整理:王怡闻 引言 图1 尽管大多数方法成功捕捉到了与语音节奏对齐的节拍手势,它们在手势生成中的语言控制方面仍然不足,因此难以生成对语句整体意义有贡献的...
76、MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant 二十五、交通驾驶 77、Controllable Safety-Critical Closed-loop Traffic Simulation via Guided Diffusion https://safe-sim.github.io/ 78、Generalized Predictive Model for Autonomous Driving ...