(3) We propose a multi-head cross-sinusoidal threshold attention mechanism that combines convolution kernel spectra and spatial patch tokens, using sine functions to limit the dot product size of Q, K, and V. This ensures that the attention output values fall within the effective range of the...
The process of standard and transposed convolutions is shown in Figure 2 [56]. Figure 2. Standard convolution (a) and TC (b) (blue is input, shadow is filter size, green is output). TC is not exactly the reverse operation of standard convolution, but it reconstructs the high-...