This paper proposes a new evaluation framework, Story Oriented Dense video cAptioning evaluation framework (SODA), for measuring the performance of video story description systems. SODA first tries to find temporally optimal matching between generated and reference captions to capture the story of a ...
Therefore we also use SODA c [26] (S) for an overall dense video captioning evaluation. To further isolate the evaluation of event localization, we report the average precision and average recall across IoU thresholds of {0.3, 0.5, 0.7, 0.9} and their harmonic m...
1https://github.com/ranjaykrishna/densevid_eval 2On 11/02/2017, the official evaluation tool fixed a critical issue; only one out of multiple incorrect predictions for each video was counted. This leads to performance overestimation of [27, 37]. Thus, we received raw re- sults from the ...
Download the dense video captioning evaluationscriptsand place it under thetoolsdirectory. Make sure you recursively clone the repo. Our code is equavalent to the official evaluation code from ActivityNet 2017 Challenge, but faster. Note that the current evaluationscriptshad a few major bugs fixed ...
The performance of the captioning module on ground truth segments might be obtained from the file with pre-trained captioning module. You may also want to use the official evaluation script with ./data/val_*_no_missings.json as references (-r argument). import torch cap_model_cpt = torch....
The proposed model was compared with state-of-art models for evaluation. While Li et al. [18] proposed a dense video captioning method by localizing each event’s temporal events and sentence generation. The proposed model also consists of two parts: The Temporal Events Proposal (TEP) and ...
METEOR: An automatic metric for MT evaluation with improved correlation with human judgments D. Cai et al. 3DJCG: A unified framework for joint dense captioning and visual grounding on 3d point clouds N. Carion et al. End-to-end object detection with transformers D.Z. Chen et al. ScanRefer...
Evaluation Please note that the official evaluation metric has beenupdated(Line 194). In the paper, old metric is reported (but still, you can compare results from different methods, all CVPR-2018 papers report old metric). Pre-trained Model & Results ...
Table 1. A comprehensive directory of recent surveys in Video Captioning (2019-2023) Year Title Ref. Publication Venue 2019 Video Description: A Survey of Methods, Datasets, and Evaluation Metrics (Aafaq et al., 2019) ACM computing surveys 2020 A Comprehensive Review on Recent Methods and ...
Previous VideoLLMs attempt to solve DVC in a single step, failing to utilize their reasoning capability. Moreover, previous training objectives for VideoLLMs do not fully reflect the evaluation metrics, therefore not providing supervision directly aligned to target tasks. To address such a problem,...