本综述还配套建立了一个GitHub仓库,将调查中提到的论文按相同的分类进行整理,网址为:https://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey,作者团队将积极维护并及时纳入新研究。 模型架构 图3:高效MLLMs的模型架构 按照标准的MLLM框架,高效MLLMs分为三个主要模块: 视觉编码器g,负责接收和处理视觉输入;...
Efficient-Multimodal-LLMs-Survey Efficient Multimodal Large Language Models: A Survey[arXiv] Yizhang Jin12, Jian Li1, Yexin Liu3, Tianjun Gu4, Kai Wu1, Zhengkai Jiang1, Muyang He3, Bo Zhao3, Xin Tan4, Zhenye Gan1, Yabiao Wang1, Chengjie Wang1, Lizhuang Ma2 ...
A Survey on Benchmarks of Multimodal Large Language Models - Timothyxxx/Evaluation-Multimodal-LLMs-Survey
MLLMs使LLMs能够感知和理解多种模态的数据。其中,视觉+LLM最为突出,这些模型也被称为视觉-LLMs或大型视觉语言模型。MLLMs的目标是赋予LLMs"看见"世界的能力,结合强大的推理和语言生成能力,支持图像/视频描述和视觉问答等下游任务。 整合视觉和语言模态主要有两种方法: 基于预训练单模态模型:使用预训练的视觉编码器...
综述一:A Survey on Multimodal Large Language Models 一、多模态LLM的组成部分 (1)模态编码器 (2)语言模型 (3)连接器 二、预训练 三、SFT微调 四、RLHF对齐训练 (1)使用常见的PPO (2)使用DPO直接偏好对齐 (3)常见用于对齐的偏序数据集 综述二:MM-LLMs: Recent Advances in MultiModal Large Language Mod...
Our key argument is that evaluation should be regarded as a crucial discipline to support the development of MLLMs better. For more details, please visit our GitHub repository: https://github.com/swordlidev/Evaluation-Multimodal-LLMs-Survey. ...
A Survey of Multimodal Large Language Model from A Data-centric PerspectiveO网页链接 这篇论文从以数据为中心的视角全面调查了多模态大型语言模型(MLLM)。人类通过视觉、嗅觉、听觉和触觉等多种感官感知世界,与此类似,多模态大型语言模型通过集成和处理来自文本、视觉、音频、视频和3D环境等多个模态的数据,增强了...
(LAVR). To conclude the paper, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the ...
Recently, the multimodal large language model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of the MLLM, such as writing stories based on images...
Code Edit beccabai/Data-centric_multimodal_LLM official 79 Tasks Edit Language Modeling Language Modelling Large Language Model Multimodal Large Language Model Survey Datasets Edit Add Datasets introduced or used in this paper Results from the Paper Edit Submit results from this paper to get...