(2018) found no significant difference in the clinical effect between rigid and deformable registration methods. To the best of our knowledge, this is the first work to embed the non-local attention in the deep neural network for image registration, which substantially extends our preliminary work...
We design a dual-encoder to learn prior knowledge and spatial deformation among pre- and intra-operative CT pairs and DR parallelly for 2D/3D feature deformable conversion. To calibrate the cross-modal fusion, we insert cross-attention modules to enhance the 2D/3D feature interaction between dual...
UVTR[22] generates a unified representation in the 3D voxel space by deformable attention[60]. While for query-based methods, FUTR3D[8] defines the 3D reference points as queries and directly samples the features from the coordinates of pro- jected planes. Trans...
1. Introduction With the stream of multimedia data flourishing on the In- ternet in the format of videos, images, text, etc, cross-modal retrieval task has attracted more and more attention from the multimedia communities. Cross-modal retrieval is the task of retrieving data from ...
Up-DownBottom-up and top-down attention for image captioning and visual question answeringCVPR2018 GCN-LSTMExploring visual relationship for image captioningECCV2018 TransformerConceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioningACL2018 ...
The SiamFEA tracker [12] combines visible and infrared modalities using self-attention mechanisms. iReIDNet [13] enhances person ReID through spatial feature transforms and coordinate attention. A transformer-based dual-branch model [14] improves performance via global–local feature interaction, while...
The SiamFEA tracker [12] combines visible and infrared modalities using self-attention mechanisms. iReIDNet [13] enhances person ReID through spatial feature transforms and coordinate attention. A transformer-based dual-branch model [14] improves performance via global–local feature interaction, while...
The SiamFEA tracker [12] combines visible and infrared modalities using self-attention mechanisms. iReIDNet [13] enhances person ReID through spatial feature transforms and coordinate attention. A transformer-based dual-branch model [14] improves performance via global–local feature interaction, while...
The stair attention divides the attentive weights into three levels, allowing for better focus on different regions in the search scope. Additionally, CIDEr-based reward reinforcement learning [36] is used to enhance the quality of the generated sentences. Du et al. [37] proposed a Deformable ...
In this paper, we propose a cross-modal segmentation network for winter wheat mapping in complex terrain using remote-sensing multi-temporal images and DEM data. First, we propose a diverse receptive fusion (DRF) module, which applies a deformable receptive field to optical images during the ...