This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we apply video content analysis to detect slides and optical character recognition to obtain their text. We extract textual metadata by applying ...
Survey on Video-Text Cross-Modal Retrieval , the discussion shifts to an experimental viewpoint, introducing benchmark datasets and evaluation metrics specific to video-text cross-modal retrieval. ... L Chen,XI Yimeng,L Liu - 《Journal of Computer Engineering & Applications》 被引量: 0发表: 20...
Video text is very important semantic information, which brings precise and meaningful clues for video indexing and retrieval. However, most previous appro... G Gao,Z He,H Chen - Springer International Publishing 被引量: 0发表: 2015年 Text information extraction in images and video: A survey ...
It's based on our survey paper: From Sora What We Can See: A Survey of Text-to-Video Generation. In this survey, We have conducted a comprehensive exploration of existing works in the Text-to-Video field using OpenAI’s Sora as a clue, and we have also summarized 24 datasets and 9 ...
Tiwari AK, Kanhangad V, Pachori RB (2017) Histogram refinement for texture descriptor based image retrieval. Signal Process Image Commun 53:73–85 Article Google Scholar Venugopalan S, Hendricks LA, Mooney R, Saenko K (2016) Improving lstm-based video description with linguistic knowledge mined...
Text extraction in video documents, as an important research field of content-based information indexing and retrieval, has been developing rapidly since 1990s. This has led to much progress in text extraction, performance evaluation, and related applications. By reviewing the approaches proposed during...
Text in images and video frames carries important information for visual content understanding and retrieval. In this paper, by using multiscale wavelet features, we propose a novel coarse-to-fine algorithm that is able to locate text lines even under complex background. First, in the coarse det...
Uniter: Universal image-text representation learning scholar 2020 multimodal encoder combined embeddings COCO, Visual Genome, Conceptual Captions qa/image-text retrieval image + text 12-in-1: Multi-task vision and language representation learning scholar 2020 multimodal encoder combined embeddings COCO, Fli...
(retrieval and supervised learning methods) with generative models to establish an action retrieval database to improve the controllability of the generation process, the pose modeling and transition module realizes the extraction of pose information and 3D modeling, the video frame generation and ...
To date, many algorithms have been proposed to facilitate the similarity measure of video–text retrieval from the single global semantic to multi-level semantics. However, these methods may suffer from the following limitations: (1) largely ignore the relationship semantic which results in semantic ...