Specifically, AA is reshaped to RC×C, and then perform a matrix multiplication between AA and the transpose of AA. Finally, a softmax layer is applied to obtain the channel attention map ∈RC×C: xji=exp(Ai⋅Aj)∑i=1Cexp(Ai⋅Aj) (2) where xji measures the ith channel’s ...
The learning rate was initially set to 0.001, and then decreased by the multiplication with a factor of 0.5 when the validation loss stopped decreasing. The batch size was set to 32, and maximum epochs to 200. During the training phase, the input images with size of 256 × 256 were ...