Memory-Augmented Image CaptioningZhengcong FeiNational Conference on Artificial Intelligence
Zero-shot image captioning (IC) without well-paired image-text data can be divided into two categories, training-free and text-only-training. The main difference between them is whether using a textual corpus to train the LM. Though achieving attractive performance w.r.t. some metrics, existin...
Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning In this paper, we propose a retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning (RAMP), which makes... C Xu,M Yang,X Ao,... - Know...
we introduce a highly effective retrieval-augmented image captioning method that prompts LLMs with object names retrieved from External Visual--name memory (EVCap). We build ever-changing object knowledge memory using objects' visuals and names, enabling us to (i) update the memory at a minimal...
来自NIPS2017的论文Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space 。 研究内容 视觉Captioning问题:图片生成Caption 主要工作 Additive Gaussian VAE 关键思想 利用VAE学习数... 「Mac」uTorrent 在 MacOS Catalina 不能用——替代品 transmission/ q...
For the video captioning task, we also conduct experiments on Youcook2 dataset.You can download videos for each dataset through the script provided here (lavis/datasets/download_scripts). For LVU/Breakfast/COIN datasets, please download the original videos through the official link provided above....
In this paper, we propose a retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning (RAMP), which makes full use of the R-best retrieved candidate captions to enhance the image paragraph captioning via adversarial training. Concretely, RAMP treats ...
MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioningdoi:10.3390/s24248013Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, ...
This paper proposes a hybrid model that combines a pre-trained model with a retrieval-based memory mechanism to tackle the Personalized Image Captioning problem. Our method involves two main phases: (1) Constructing User Memory (UM), and (2) generating image descriptions using the pre-trained ...
Existing image paragraph captioning methods generate long paragraph captions solely from input images, relying on insufficient information. In this paper, ... C Xu,M Yang,X Ao,... - Knowledge-Based Systems 被引量: 0发表: 2020年 Arrangement for selbsttaetigen outputting an object of value 1505...