它是用swinT做backbone进行特征提取,然后送给maskRCNN来特征处理,neck有4个channel; 关于maskRCNN的学习:https://github.com/facebookresearch/Detectron https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/mask-rcnn https://github.com/matterport/Mask_RCNN 好,进入正题吧! 首先clon...