GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020) v-iashin.github.io/bmt Topicsaudio video pytorch transformer temporal-action-proposals i3d video-features dense-video-captioning multimodal-fusion activitynet-captions bmvc bmt bmvc20 bi-modal-transformer proposal-generator ...
End-to-End Dense Video Captioning with Parallel Decoding 论文地址:https://arxiv.org/abs/2108.07781 代码地址:https://github.com/ttengwang/pdvc ▊2. Motivation 视频字幕作为视频理解的一个新兴分支,近年来受到了越来越多的关注...
Our code is available at https://github.com/valterlej/dvcusi .doi:10.1016/j.jvcir.2024.104385Valter EstevamRayson LarocaHelio PedriniDavid MenottiElsevier Inc.Journal of Visual Communication and Image Representation
End-to-End Dense Video Captioning with Parallel Decoding 论文地址:https://arxiv.org/abs/2108.07781 代码地址:https://github.com/ttengwang/pdvc 2. Motivation 视频字幕作为视频理解的一个新兴分支,近年来受到了越来越多的关注。然而,由于真实的视频通常很长,并且由各种背景视频片段组成,单句字幕方法往往会生...
1https://github.com/ranjaykrishna/densevid_eval 2On 11/02/2017, the official evaluation tool fixed a critical issue; only one out of multiple incorrect predictions for each video was counted. This leads to performance overestimation of [27, 37]. Thus, we received raw re- sults from the ...
Code is available at \url{https://github.com/mlvlab/VidChain}. PDF Abstract Code Edit mlvlab/vidchain official 15 Tasks Edit Dense Video Captioning Video Captioning Video Grounding Video Segmentation Video Semantic Segmentation Video Understanding ...
Datasets Edit ActivityNet Captions Submitresults from this paperto get state-of-the-art GitHub badges and help the community compare results to other papers. Methods Edit AddRemove No methods listed for this paper. Addrelevant methods here
在最新的多项选择基准 MVBench 上,PLLaVA 在 20 个子任务中平均达到 58.1% 的准确率,这是最新的最新结果,比 GPT4V (IG-VLM) 高 14.5%。 github:GitHub - magic-research/PLLaVA: Official repository for the paper PLLaVA PLLaVA在评估集上的指标编辑...
PLLaVA: Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning 相关链接:arxivgithub 关键字:视频密集字幕描述、参数自由、视频语义理解、深度学习、大规模语言模型 摘要 视觉-语言预训练已经在一系列图像语言应用中显著提升了性能。然而,用于视频相关任务的预训练过程需要极大的计算和数据资...