如表格2所示,在与基于相同YOLOv5 Backbone 网的SOTA方法以及作者基于YOLOv8 Backbone 网的方法进行比较时,作者的Fusion-Mamba在所有类别上的AP和AP指标上表现最佳,并且在_People_, _Bus Motorcycle_和_Truck_类别上实现了新的SOTA结果,同时AP和AP指标进一步提升了和。此外,尽管YOLOv5的特征表示能力低于YOLOv7,但使用...
3.Mamba的整合:Mamba模型的整合帮助捕捉长距离依赖特征,同时保持计算的高效性,这对于处理高维图像数据至关重要。 4.损失函数设计:结合强度损失、纹理损失和结构损失的损失函数,确保了在训练过程中保留关键视觉细节,如对比度、边缘信息和整体结构的相似性。 FusionMamba网络架构 FusionMamba 在一般融合过程包括三个关键组件...
与现有的融合方法不同,作者构建了Fusion-Mamba模块,用于在隐空间中对不同模态进行对齐,从而显著提升目标检测性能,最高可达5.9%。本文提出了一种新颖的Fusion-Mamba方法,用于跨模态目标检测任务。该方法通过State Space Channel Swapping (SSCS)模块和Dual State Space Fusion (DSSF)模块,实现了多模态特征的有效融合。...
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba - millieXie/FusionMamba
In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This ...
(2024) 2:37 https://doi.org/10.1007/s44267-024-00072-9 Visual Intelligence RESEARCH Open Access FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba Xinyu Xie1,2, Yawen Cui3, Tao Tan2, Xubin Zheng1 and Zitong Yu1* Abstract Multimodal image fusion aims to ...
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction, thereby reducing disparities between cross-modal features and enhancing the representation consistency of fused features. FMB contains two modules: the State Space Channel Swapping (SSCS) ...
In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This ...
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba - FusionMamba/logger.py at main · millieXie/FusionMamba
FusionMamba好像进一步提升了全局能力! 多模态图像融合旨在从不同的模态中整合信息,以创建具有全面信息和详细纹理的单张图像。然而,基于卷积神经网络融合模型在捕捉全局图像特征方面存在局限性,这是由于它们侧重于局部卷积操作。尽管基于Transformer的模型在全球特征建模方面表现出色,但它们却面临着由二次复杂度引起的计算挑战...