通过引入ViT-Lite能够有效从头开始在小型数据集上实现更高精度,打破Transformer需要大量数据的神话。 引入新型序列池化策略(sequence pooling)的CVT(Compact Vision Transformer),从而让Transformer无需class token 引入CCT(Compact Convolutional Transformer)来提升模型性能,同时可以让图片输入尺寸更加灵活。 方法 提出了三种模型...
CVT: Compact Vision Transformer, 这个结构主要是去除了class token这项,而是提出了一种Sequence pooling的方式融合每个patch token的embedding用于分类,说白了就是在patch token的embeddings上生成一个attention用于融合每个部件的特征,这部分对于分类性能的提升很明显。 CCT: Convolutional Compact Transformer, 在CVT的基础...
Compact Convolutional Transformers utilize sequence pooling and replace the patch embedding with a convolutional embedding, allowing for better inductive bias and making positional embeddings optional. CCT achieves better accuracy than ViT-Lite (smaller ViTs) and increases the flexibility of the input parame...
Compact convolutional transformersDeep learningNeural signal processingMotor imagery electroencephalography (EEG) analysis is crucial for the development of effective brain-computer interfaces (BCIs), yet it presents considerable challenges due to the complexity of the data and inter-subject variability. ...
Image ClassificationCIFAR-10CCT-6/3x1Percentage correct95.29# 132 Compare Image ClassificationCIFAR-100CCT-7/3x1*Percentage correct82.72# 95 Compare Image ClassificationCIFAR-100CCT-6/3x1Percentage correct77.31# 135 Compare PARAMS3.17M# 182 Compare ...