For the panoptic segment decoder and Transformer encoder, we adopt the same settings as [4], with ResNet-50 [19] backbone, 9 decoder layers, and 100 queries. The mask loss Lmask comprises binary cross-entropy loss Lce and dice loss [36] Ldice. We set loss...
On ImageNet, the trained linear classifier is close to supervised. The performance of ResNet50 reaches a top-1 accuracy rate of 76.5% and even surpasses some supervised learning methods on some data sets. However, the SimCLR method relies on a huge batch of data and a deeper network, ...