The absence of a large, carefully labeled audio-visual active speaker dataset has limited algorithm evaluation in terms of data diversity, environments, and accuracy. In this paper, we present the AVA Active Speaker detection dataset (AVA-ActiveSpeaker) which has been publicly released to facilitate...
python get_ava_active_speaker_performance.py -p final/AV_Enc.csv -g final/gt.csv Temporal Modeling and Inter-Speaker Relation Modeling (TM_ISRM): Training, Feature Extraction and Postprocessing Training TM and ISRM stages can be trained with the following command: ...