In this paper, we propose PPCap, a novel Plug and Play framework for stylized image captioning, where only a stylized image captioning model needs to be trained on small-scale unpaired stylized corpus. Then it will form a generative style discriminator via Bayes rule by the contrast of the ...
2.类似于Multi-task sequence to sequence training. First task, train to generate factual captions given the paired images,更新所有的参数. Second, factored LSTM is trained as a language model,只更新SR或者SH. “Factual” and “Emotional”: Stylized Image Captioning with Adaptive Learning and Attentio...
In this paper, we propose an adversarial learning network for the task of multi-style image captioning (MSCap) with a standard factual image caption dataset and a multi-stylized language corpus without paired images. How to learn a single model for multi-stylized image captioning with unpaired ...
The existing methods for Stylized Image Captioning Generation are mainly done by reinforcement learning or contrastive learning. Even with the assistance of large models such as CLIP and GPT, previous methods still require fine-tuning to generate targeted style captions and these methods necessitate a ...
To address these issues, we propose an image captioning model called ATTEND-GAN which has two core components: first, an attention-based caption generator to strongly correlate different parts of an image with different parts of a caption; and second, an adversarial training mechanism to assist ...
Due to the emphasis on emotional expression, the model may neglect the semantic representation, which reduces the consistency of the stylized caption with image object and content. Therefore, based on adversarial training mechanism , we proposed an image captioning system CA-GAN to address this ...
Attention modelGenerating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous,...doi:10.1007/978-3-030-01249-6_32Tianlang Chen...