Then, it is passed to LSTM for temporal summarization. The output of LSTM is passed to an attention layer to focus on the emotionally salient part of the utterances. Normalized importance weights are computed by a softmax function. From these weights, utterance level representation is calculated ...
Section 7 provides a method for detecting emotion from multimodal data, and Section 8 reports the related experimental results. Finally, Section 9 concludes the paper and suggests directions for future work. 2. Problem Definition In this work, the authors focus on the process of classifying ...