Large Vision-Language Model 通常LVLM包含⼀个视觉编码器、⼀个⽂本编码器和⼀个跨模态的对⻬⽹络。 LVLMs的训练通常由三部分组成: 视觉和⽂本编码器在⼤规模单模态数据集上分别进⾏预训练。 将这两个编码器通过视觉⽂本对⻬预训练进⾏对⻬,这可以使得LLM为给定图像⽣成有意义的描述。
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models 幻觉仍然是大型视觉语言模型 (LVLM) 面临的重大挑战。为了缓解这个问题,一些方法(称为对比解码)通过手动干扰原始视觉或指令输入来诱发幻觉,然后通过对比原始和受干扰的 LVLM 的输出来缓解幻觉。然而,这些整体输入干扰有时会引起潜在...
We also highlight several promising avenues for future research, such as hallucinations in large vision-language models and understanding of knowledge boundaries in LLM hallucinations, paving the way for forthcoming research in the field. 因此,我们的调查对这些挑战进行了深入分析,旨在为开发更健壮的RAG系...
Despite their remarkable ability to understand both textual and visual data, large vision-language models (LVLMs) still face issues with hallucination. This is particularly presented as the object hallucination, where the models inaccurately describe objects in the images. Current efforts mainly focus ...
Large Vision-Language Models (LVLMs) have recently achieved remarkable success. However, LVLMs are still plagued by the hallucination problem, which limits the practicality in many scenarios. Hallucination refers to the information of LVLMs' responses that does not exist in the visual input, which...
Current large vision-language models (LVLMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning. To address this, we introduce CCEval, a GPT-4 assisted evaluation method tailored...
@article{zhou2023analyzing, title={Analyzing and mitigating object hallucination in large vision-language models}, author={Zhou, Yiyang and Cui, Chenhang and Yoon, Jaehong and Zhang, Linjun and Deng, Zhun and Finn, Chelsea and Bansal, Mohit and Yao, Huaxiu}, journal={arXiv preprint arXiv:...
AI Hallucination in Large Language Processing Models Let's consider what AI hallucination would look like in a large language processing model such as ChatGPT. A ChatGPT hallucination would result in the bot giving you an incorrect fact with some assertion, such that you would naturally take such...
This repo provides the source code & data of our paper: Evaluating Object Hallucination in Large Vision-Language Models (EMNLP 2023). @inproceedings{Li-hallucination-2023, title={Evaluating Object Hallucination in Large Vision-Language Models}, author={Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang,...
As Large Language Models (LLMs) continue to advance in their ability to write human-like text, a key challenge remains around their tendency to "hallucinate"– generating content that appears factual but is ungrounded. This issue of hallucination is arguably the biggest hindrance to safely deployin...