Detail Orientation2.91# 12 Compare Contextual Understanding3.46# 13 Compare Temporal Understanding2.39# 15 Compare Consistency2.81# 10 Compare mean2.99# 13 Compare Video-based Generative Performance Benchmarking (Correctness of Information)VideoInstructChat-UniVigpt-score2.89# 10 ...
5. Conclusion In this paper, we introduce Chat-UniVi, a unified multi- modal large language model designed to comprehend and engage in conversations about both images and videos. To seamlessly bridge the intricate spatial nuances of im- ages with the broade...
[2024/04/05] Chat-UniVi has been selected as a Highlight paper at CVPR 2024! (Top 3% of 11532 submissions). [2024/02/27] Our Chat-UniVi has been accepted by CVPR 2024! [2024/01/05] We enhance the video loading code by introducing support for variable-length videos. This improvement...
Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations. However, existing methods encounter challenges in effectively handling both image and video understanding, particularly with li...