这两个领域算是对视频mid-level的理解,而我最近看论文主要在关注如何去理解视频的高层语义(high-level)信息,这方面一个重要的领域就是video captioning。video captioning的任务是给视频生成文字描述,和image captioning(图片生成文字描述)有点像,区别主要在于视频还包含了时序的信息。关于video captioning,我目前还没有...
2021: CLIP Meets Video Captioners: Attribute-Aware Representation Learning Promotes Accurate Captioning 2022:SWINBERT: End-to-End Transformers with Sparse Attention for Video Captioning 端到端做video captioning的文章 2022: Zero-Shot Video Captioning with Evolving Pseudo-Tokens zero-shot video captioning ...
This paper presents the survey of the state of art techniques of various video captioning methods. There have been many inputs provided by people worldwide in this domain; thus, there was a need to compile, study and analyze all the results and present that in a comprehensive study, which ...
Video Captioning with Transferred Semantic Attributes Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei IEEE Conference on Computer Vision and Pattern Recognition (CVPR)|July 2017 Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most ...
In this paper, we focus on discovering and integrating rich visual and textual knowledge to benefit video captioning. Specifically, we propose a Hierarchical & Multimodal Video Caption (HMVC) model to jointly learn the dynamics within both visual and textual modalities to infer an arbitrary length ...
Video captioning often uses an attentive encoder-decoder as the baseline model. However, the conventional attention mechanism still remains two problems. First, the attended visual feature is often irrelevant to the target word state, because the attention process only uses the unidirectional flow from...
The captions generated by video captioning can be further utilized for video retrieval, summarization, question-answering, etc. Video Question-Answering (video-QA) involves querying the system to obtain an answer in response. This paper presents a brief survey of the video captioning techniques and ...
By running this command, you can get the pie chart in the paper. And when uncommenting the visualization code insample.py, you can visualize the module selection process. Video Captioning Papers This repositorycontains a curated list of research papers in Video Captioning(from 2015 to 2020). ...
推荐:Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning Video Question Answering 前面三点在 @qjzhao 的回答中已经解释得很详细了,由于做的人太多,我也没有办法评价谁是最好的,列出的paper是我觉得还算比较新的,我打算主要讲讲Video Question Answering。 Video QA是很...
VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset. - google-research-datasets/videoCC-data