论文链接: https://arxiv.org/abs/2409.03420 代码链接: https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/DocOwl2 模型结构 NLP领域对于文本总结和压缩已经有了很多研究。考虑到文档图片的主要信息都是布局和文字信息,且现有的...
github: GitHub - X-PLUG/mPLUG-DocOwl: mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding DocOwl2 多页文档理解性能展示 模型结构 NLP领域对于文本总结和压缩已经有了很多研究。考虑到文档图片的主要信息都是布局和文字信息,且现有的多模态大模型普遍通过一个vision-to-text模块...
``` 0 comments on commit 457327e Please sign in to comment. Footer © 2024 GitHub, Inc. Footer navigation Terms Privacy Security Status Docs Contact Manage cookies Do not share my personal information
0 comments on commit fab2fd7 Please sign in to comment. Footer © 2024 GitHub, Inc. Footer navigation Terms Privacy Security Status Docs Contact Manage cookies Do not share my personal information
0 comments on commit d4bde9d Please sign in to comment. Footer © 2024 GitHub, Inc. Footer navigation Terms Privacy Security Status Docs Contact Manage cookies Do not share my personal information
https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/DocOwl2 高分辨率文档图像的挑战 在处理高分辨率文档图像时,多模态大型语言模型(MLLMs)面临着一系列挑战。随着文档图像分辨率的提高,模型需要生成数千个视觉令牌来理解单一文档图像,这不仅增加了GPU内存的消耗,也导致了推理速度的降低,特别是...
sys.path.append('/nas-alinlp/anwenhu/code/mPLUG_github/mPLUG-DocOwl2/evaluation') print(sys.path) import re from evaluator import doc_evaluate import os from tqdm import tqdm import random from pathlib import Path def parser_line(line): image = line['image'][0] assert len(line['messag...
0 comments on commit dc3425d Please sign in to comment. Footer © 2024 GitHub, Inc. Footer navigation Terms Privacy Security Status Docs Contact Manage cookies Do not share my personal information