论文链接:Cross-Modality Fusion Transformer for Multispectral Object Detection 论文代码:https://github.com/DocF/multispectral-object-detection Motivation 以往CNNs的工作,没有对长距离和全局的信息进行建模。本文提出一种Cross-Modality Fusion Transformer(CFT)模块,通过Transformer的能力充分挖掘全局上下文信息。Attentio...
Cross-Modality Fusion Transformer for Multispectral Object Detection
In order to better combine the two modalities, we propose a novel Cross-Modal Transformer for human action recognition鈥擟MF-Transformer, which effectively fuses two different modalities. In spatio-temporal modality, video frames are used as inputs and directional attention is used in the ...
More importantly, by leveraging the self attention of the transformer, the network can naturally carry out simultaneous intra-modality and inter-modality fusion, and robustly capture the latent interactions between RGB and Thermal domains, thereby significantly improving the performance of multispectral ...
Transformer-based cross-modality interaction guidance network for RGB-T salient object detection 2024, Neurocomputing Citation Excerpt : Salient object detection is an important research direction in computer vision, aiming to mimic the human eye in detecting and segmenting the most striking parts of an...
Two modality features are concatenated in BEV space and the BEV encoder is adopted for fusion. (b) TransFusion first generates the queries from the high response regions of Li- DAR features. After that, object queries interact with point cloud features and image ...
CMOT: A cross-modality transformer for RGB-D fusion in person re-identification with online learning capabilities 2024, Knowledge-Based Systems Citation Excerpt : Additional methods, such as SM-SGE (12.8%), Distillation (41.3%), SimMC (12.3%), and Hi-MPC (17.4%), manifest significantly lower...
Learning cross-modality fusion is a crucial step of VideoQA. How to ensure that the fused representation well preserves the valuable temporal characteristic of videos is the key research question of robust VideoQA. In this work, to prevent the model from leveraging the spurious correlation between...
这说明CFT模块在处理过程中进行了特征的提取与融合,去除了原始特征中的噪声或不重要的信息,仅保留了与检测任务相关的关键特征。 展示了原始特征与经过Cross-Modality Fusion Transformer (CFT) 模块处理后的特征的可视化对比,旨在说明CFT模块在特征提取和信息融合中的作用。
第二个模块是特征融合模块(Feature Fusion module, FFM),该模块分两个阶段构建,将RGB和X模态中修正后的特征合并为一个单一的特征进行语义预测。基于视觉transformer的自我注意获得的大接受域,在FFM的第一阶段设计了交叉注意机制,实现了跨模态的全局推理。在第二阶段,采用混合信道嵌入产生增强的输出特征。因此,我们引入...