然而,随着视觉 Transformer (ViTs)架构的进步,对于 Transformer 的可解释性研究已经显著增加。在Attention Rollout方法中,来自不同层的注意力分数线性组合,然而这种方法很难区分正负贡献。层相关传播(LRP)方法将预测类别相关的相关性从后向前传播到输入图像。 已有几项工作将LRP应用于 Transformer。然而,许多这些研究忽视了...
To clearly reveal the class activation maps, the CNN branch used the Grad-CAM [53], and the transformer branch used attention rollout [54]. The feature maps of the last layer of the CNN branch and the last layer of the transformer branch were the inputs of the class activation maps. ...
To clearly reveal the class activation maps, the CNN branch used the Grad-CAM [53], and the transformer branch used attention rollout [54]. The feature maps of the last layer of the CNN branch and the last layer of the transformer branch were the inputs of the class activation maps. ...