we propose a novel Memory-Augmented zero-shot image Captioning framework (MeaCap). Specifically, equipped with a textual memory, we introduce a retrieve-then-filter module to get key concepts that are highly related to the image. By deploying our proposed memory-augmented visual-related fusion sco...
Memory-Augmented Image CaptioningZhengcong FeiNational Conference on Artificial Intelligence
Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discoursebased coherence across the sentences in the paragraph. Towards this goal, we propose a new approach called Memory-Augmented Rec...
MAG-Net: A Memory Augmented Generative Framework for Video Anomaly Detection Using Extrapolation In this paper, we propose a retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning (RAMP), which makes... S Dube,K Biradar,S Vipparthi,... 被引量...
Zero-shot EvaluationOur model can also leverage pre-trained weights from InstructBlip without any finetuning to conduct zero-shot evaluation on video datasets.bash run_scripts/${dataset}/test.shHyper-parametersOne important hyper-parameters memory_bank_length, please change that in the training ...
we introduce a highly effective retrieval-augmented image captioning method that prompts LLMs with object names retrieved from External Visual--name memory (EVCap). We build ever-changing object knowledge memory using objects' visuals and names, enabling us to (i) update the memory at a minimal...
ACL20|MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning 提升了一点性能,本文使用MART比Transformer-XL方法生成的文本更连贯、没有重复性,说明更能解决长时间依赖问题,是dense video captioning的sota。...大学教堂山分校(UNC)合作完成。提出了一种增强视频描述生成的连贯性的循环Tra...
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioningdoi:10.18653/V1/2020.ACL-MAIN.233Jie LeiLiwei WangYelong ShenDong YuTamara L. BergMohit BansalAssociation for Computational Linguistics
In this paper, we propose a retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning (RAMP), which makes full use of the R-best retrieved candidate captions to enhance the image paragraph captioning via adversarial training. Concretely, RAMP treats ...
we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses a memory module to augment the transformer architecture. The memory module generates a highly summarized memory state from the video segments and the sentence history so as to help better prediction of the ne...