The structure of the original transformer is shown in Figure 2, in which a feed-forward network is a two-layer Multi-Layer Perceptron (MLP). An original transformer has an encoder–decoder structure. An encoder consists of six encoder layers in which multi-head self-attention and forward feed...