Learning efficient deep representations from spectrogram for speech emotion recognition still represents a significant challenge. Most existing spectrogram feature extraction methods empowered by deep learning have demonstrated great success, but the respective changing information of time and frequency exhibited ...
Multimodal speech emotion recognition (MSER) is an emerging and challenging field of research due to its more robust characteristics than unimodal. However, in multimodal approaches, the interactive relations for model building using different modalities of speech representations for emotion recognition have...
IEEE: multimodal cross- and self-attention network for speech emotion recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP): Jun 06–11 2021; Electr Network. 2021, p. 4275–4279. 36. Chen CF, Fan Q, Panda R. CrossViT: cross-attention multi-...
Memristor-Based Progressive Hierarchical Conformer Architecture for Speech Emotion Recognition Speech Emotion Recognition (SER) is a challenging task characterized by the diversity and complexity of emotional expression. Due to its powerful feature e... T Zhao,Y Zhou,X Hu - 《International Journal of ...
In our proposed robust end-to-end speech recognition scheme, the discriminant network first acts as the local guide for the enhancement module, where D shifts the training of G towards the distribution of clean data; thereafter, it is deployed as the global guide for the whole scheme, where ...
{Brutti}, A. and {Cavallaro}, A.}, title = "{Is cross-attention preferable to self-attention for multi-modal emotion recognition?}", booktitle = {Proceedings of the International Conference on Acoustics, Speech, and Signal Processing}, pages={1--1}, year = {2022}, month = {May}, ...
Paper tables with annotated results for Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks
As an important branch of affective computing, Speech Emotion Recognition (SER) plays a vital role in human-computer interaction. In order to mine the relevance of signals in audios an increase the diversity of information, Bi-directional Long-Short Term Memory with Directional Self-Attention (...
(2019). Self-attention networks for connectionist temporal classification in speech recognition. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7115–7119): IEEE. Sharma, N., & Yalla, P. (2018). Developing research questions in natural...
Speech emotion recognition (SER) is an active research field of digital signal processing and plays a crucial role in numerous applications of Human-computer interaction (HCI). Nowadays, the baseline state of the art systems has quite a low accuracy and high computations, which needs upgrading to...