Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images 1. Cross-Layer Feature Pyramid Transformer(CFPT)的基本原理 CFPT 是一种专为航拍图像中小目标检测设计的特征金字塔网络。它避免了传统的上采样操作,而是通过跨层交互直接实现特
In thispaper, we propose the cross-layer feature pyramid transformer designed for small object detection in aerial images. Below is the performance comparison with other feature pyramid networks based on RetinaNet on the VisDrone-2019 DET dataset. ...
self.projs.append(nn.Conv2d(in_chans, dim, kernel_size=ps, stride=stride, padding=padding)) if norm_layer is not None: self.norm = norm_layer(embed_dim) else: self.norm = None def forward(self, x): B, C, H, W = x.shape # FIXME look at relaxing size constraints assert H =...
We remove the last global average pooling layer and leverage the feature pyramid structure to get denser features, which are then used as the feature extractor in our model. Following [8], [9], [10], the backbone network is pre-trained on the mini-ImageNet dataset before meta-training. ...
1d)). The input CT slices underwent three down-sampling stages in the encoder reducing the original feature map size to 1/2, 1/4, and 1/8 of its original size. Downsampling was performed in the first layer of module of each stage, corresponding to MV2↓2 in Fig. 1c and d. As ...
3. We extract feature maps from the last convolutional layer of N (·) and apply SCDA [76] to obtain attention maps A(v) and A(v˜) for normal and abnormal frames, respectively. A(·) denotes the operation to extract attention maps from N (·). We ...
we propose a novel network named cross-level-guided transformer (CLGFormer). Specifically, we devise a dynamic selection fusion module (DSF) to diminish the data discrepancy during multi-modality feature fusion. It adaptively selects multi-scale RGB features with the guidance of depth and employs ...
Additionally, inserting a convolutional layer at the end of the CHTB helps to introduce more inductive bias to the transformer for better reconstruction. 3.3. Cross-scale hierarchical transformer block Given an input feature Fi,0 to the 1-st CHTB in the i-th transformer group, denoted as hi,...
Considering IoU loss which only works well when the bounding boxes have overlap and cannot provide any moving gradient for non-overlapping cases, LWTransTracker46 (Layer-Wise Transformer Tracker) replaces IoU loss with C-IoU (Complete-IoU) loss for faster convergence and better regression accuracy...
In the Transformer layer, an excess allocation of MSA heads not only escalates the computational burden, introducing considerable redundancy but also results in the acquisition of numerous irrelevant features due to an excessive emphasis on spatial locality in linear computation. This, in turn, leads...