This paper examines the feasibility of using an audio-visual methodology for sound source localization of acoustic sources hidden from direct view. A four channel microphone array is used in conjunction with LiDAR and 2D/3D mapping to merge estimated angles of arrival with room parameters for sound...
Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interactive sound source localization. Cross-modal interaction ...
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction Audio-Visual Grouping Network for Sound Localization From Mixtures iQuery: Instruments As...
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization Hao Jiang, Calvin Murdock, Vamsi Krishna Ithapu Reality Labs Research at Meta {haojiang,cmurdock,ithapu}@fb.com Abstract Augmented reality devices have the potential to enhance human perception and...
(Xu et al.2021; Tian et al.2024), audio-visual localization (Chen et al.2021), source separation (Zhao et al.2018), dense video captioning (Xie et al.2023), and emotion recognition (Sun et al.2024). Cheng et al.2020introduced a coattention framework to leverage the symbiotic ...
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos. We define an audio-visual event as an event that is both visible and audible in a video segment. We collect an Audio-Visual Event (AVE) dataset to sys
[NeurIPS-2022] A Closer Look at Weakly-Supervised Audio-Visual Source Localization Authors: Shentong Mo, Pedro Morgado Institution: Carnegie Mellon University; University of Wisconsin-Madison [AAAI-2022] Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Authors: Xian Liu, ...
Ability to create, train, and optimize neural network architectures for audio (or audio-visual) applications such as speech enhancement, speaker recognition, echo cancellation, source localization, audio-visual speaker diarization and active speaker detection. Proven understanding of deep learn...
Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian,Jing Shi,Bochen Li,Zhiyao Duan,and Chenliang Xu University of Rochester,United States In this material,firstly,we show how we gather the Audio-Visual Event(AVE) dataset in Sec.1.Then we describe the implementation details of ...
In addition, we describe an online audio-visual speaker diarization method that leverages face tracking and identification, sound source localization, speaker identification, and, if available, prior speaker information for robustness to various real world challenges. All components are integrated in...