Large Vision-Language Model 通常LVLM包含⼀个视觉编码器、⼀个⽂本编码器和⼀个跨模态的对⻬⽹络。 LVLMs的训练通常由三部分组成: 视觉和⽂本编码器在⼤规模单模态数据集上分别进⾏预训练。 将这两个编码器通过视觉⽂本对⻬预训练进⾏对⻬,这可以使得LLM为给定图
Despite their remarkable ability to understand both textual and visual data, large vision-language models (LVLMs) still face issues with hallucination. This is particularly presented as the object hallucination, where the models inaccurately describe objects in the images. Current efforts mainly focus ...
Current large vision-language models (LVLMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning. To address this, we introduce CCEval, a GPT-4 assisted evaluation method tailored...
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models 幻觉仍然是大型视觉语言模型 (LVLM) 面临的重大挑战。为了缓解这个问题,一些方法(称为对比解码)通过手动干扰原始视觉或指令输入来诱发幻觉,然后通过对比原始和受干扰的 LVLM 的输出来缓解幻觉。然而,这些整体输入干扰有时会引起潜在...
@article{zhou2023analyzing, title={Analyzing and mitigating object hallucination in large vision-language models}, author={Zhou, Yiyang and Cui, Chenhang and Yoon, Jaehong and Zhang, Linjun and Deng, Zhun and Finn, Chelsea and Bansal, Mohit and Yao, Huaxiu}, journal={arXiv preprint arXiv:...
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation Qidong Huang1,2,*, Xiaoyi Dong2,3,†, Pan Zhang2, Bin Wang2, Conghui He2, Jiaqi Wang2, Dahua Lin2, Weiming Zhang1,,† Neng...
Finally, we highlight the promising research directions on LLM hallucinations, including hallucination in large vision-language models and understanding of knowledge boundaries in LLM hallucinations. 然后,我们的讨论转移到减轻LLM幻觉的代表性方法上。此外,我们深入研究了当前检索增强LLMs在对抗幻觉方面所面临的...
Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise ...
DRESS: (Chen et al., 2023) propose using natural language feedback (NLF), specifically critique and refinement NLF, to improve alignment with human preferences and interaction capabilities of large vision language models (LVLMs). They generalize conditional reinforcement learning to effectively incorpora...
Paper tables with annotated results for Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models