https://huggingface.co/spaces/OpenGVLab/MVBench_Leaderboard 当前,多模态大模型能力评测存在多种方式: 人类直接评测。被视作最直接有效的方法,被多模态竞技场(Multi-Modality Arena)所使用,但评测效率较低,且难以避免认知偏差,评判过程难以实现完全公平。 借助大语言模型评测。此类方式更加公正,但需要大语言模型拥有...
完整Leadeboard Leaderboard 我们的VideoChat2在15个任务上取得最佳性能,但也能看到,它在处理移动方向、动作定位、计数等任务上仍有不足。最近的一些图像对话模型,已经开始引入grouding数据增强相关能力,这也是后续视频对话模型可以突破的方向。 VideoChatGPT对话Benchmark VideoChatGPT Benchamrk 在VideoChatGPT对话Benchma...
I would request the merging, and if possible to integrate in your leaderboard. I can share the zip of all the runs over discord or email to help populate the leaderboard and it's update with new models going forward. Let me know your thoughts Collaborator FangXinyu-0913 commented Feb 25,...
Zero-Shot Video Question Answer View Accuracy by Date Created with Highcharts 9.3.0ACCURACYInternVideo2-1BInternVideo2-1BTS-LLaVA-34BTS-LLaVA-34BOther modelsModels with lowest AccuracyJan '24Jul '24Jan '2550525456586062 Filter:untagged ...