我们看到 encoder 和 decoder 的主要结构都类似于,"Multi-Head Attention(MHA) 后跟一个 Feed Forward Network(FFN)", 这样的形式. 从功能的直观角度来说, 我们认为前者进行了语义的交互, 而后者丰富了语义的表征. 所以我们在这里将前者称为语义交互模块, 将后者称为语义增强模块. 如果要将 Transformer 架构换成...
In this paper, we propose a MLP-like encoder-decoder architecture, in which per-location features and spatial information in music signals are exclusively handled by multi-layer perceptrons (MLPs). Additionally, We introduce a novel fully-connected decoder for feature aggregation without using skip-...