Due to the emphasis on emotional expression, the model may neglect the semantic representation, which reduces the consistency of the stylized caption with image object and content. Therefore, based on adversarial training mechanism , we proposed an image captioning system CA-GAN to address this ...
2.类似于Multi-task sequence to sequence training. First task, train to generate factual captions given the paired images,更新所有的参数. Second, factored LSTM is trained as a language model,只更新SR或者SH. “Factual” and “Emotional”: Stylized Image Captioning with Adaptive Learning and Attentio...
Image captioning is a classical multi-modal task for vision-language understanding. In recent years, researchers have begun to focus on generating captions with personalized styles, but the range of available styles is often fixed. The existing methods for Stylized Image Captioning Generation are mainl...
The existing methods for Stylized Image Captioning Generation are mainly done by reinforcement learning or contrastive learning. Even with the assistance of large models such as CLIP and GPT, previous methods still require fine-tuning to generate targeted style captions and these methods necessitate a ...
Attention modelGenerating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous,...doi:10.1007/978-3-030-01249-6_32Tianlang Chen...
To address these issues, we propose an image captioning model called ATTEND-GAN which has two core components: first, an attention-based caption generator to strongly correlate different parts of an image with different parts of a caption; and second, an adversarial training mechanism to assist ...