The official repository for “Image Captioning via Dynamic Path Customization”. Dynamic Transformer Network (DTNet) is a model to genrate discriminative yet accurate captions, which dynamically assigns customized paths to different samples. The framework of the proposed Dynamic Transformer Network (DTNet...
Munusamy H (2023) Multimodal attention-based transformer for video captioning. Appl Intell 1–20 Sun C, Myers A, Vondrick C, Murphy K, Schmid C (2019) Videobert: a joint model for video and language representation learning. In: Proceedings of the IEEE/CVF international conference on computer...