Sentiment analysis is considered to be a hot research topic in the field of multi-modal fusion that aims to integrate video, audio, and text modalities using fusion strategies at feature, model, and decision levels [52]. Previous works [10][19] fused audio and visual information to create ...
in the piano education field, it is essential to utilize audio information in addition to visual information due to the deep relationship between posture and sound. In this paper, we propose an audio-visual tensor fusion network (simply, AV-TFN) for piano performance posture classification. Unlike...