[推荐] * Transformers Meet Visual Learning Understanding: A Comprehensive Review 注意力机制 2篇 对抗生成学习 2篇 [推荐] * Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation 非强监督学习 3篇 [推荐] * CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Den...
Most of today’s visual transformers view an input image as a sequence of image patches while ignoring intrinsic structural information among the patches — a deficiency that negatively impacts their overall visual recognition ability. The TNT model addresses this, modelling b...
ConViT(Convit: Improving vision transformers with soft convolutional inductive biases)则将通过门控位置自注意力合并了 soft convolutional inductive biases。 CMT(Cmt: Convolutional neural networks meet vision transformers)则应用了基于深度卷积的局部感知单元和一个轻量的 Transformer 模块(使用跨步等于卷积核的深度...
[CMT] Convolutional Neural Networks Meet Vision Transformers [paper] Fine-tuning Image Transformers using Learnable Memory [paper] [TransMix] Attend to Mix for Vision Transformers [paper] [code] [NomMer] Nominate Synergistic Context in Vision Transformer for Visual Recognition [paper] [code] [SSA]...
PaCa-ViT: "PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers", CVPR, 2023 GC-ViT: "Global Context Vision Transformers", ICML, 2023 MAGNETO: "MAGNETO: A Foundation Transformer", ICML, 2023 SMT: "Scale-Aware Modulation Meet Transformer", ICCV, 2023 ...
2022CVPRCMTCMT: Convolutional Neural Networks Meet Vision TransformersCode 2021NeurIPSTwinsTwins: Revisiting the Design of Spatial Attention in Vision TransformersCode 2021ICCVCvTCvT: Introducing Convolutions to Vision TransformersCode 2021NeurIPSVitaeVitae: Vision transformer advanced by exploring intrinsic induc...
[27] Cheng Chi, Fangyun Wei, Han Hu. RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder. NeurIPS 2020 [28] Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond....
PaCa-ViT: "PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers", CVPR, 2023 GC-ViT: "Global Context Vision Transformers", ICML, 2023 MAGNETO: "MAGNETO: A Foundation Transformer", ICML, 2023 SMT: "Scale-Aware Modulation Meet Transformer", ICCV, 2023 ...
Meta V-JEPA模型:2024年2月14日,Meta发表论文《Revisiting Feature Prediction for Learning Visual Representations from Video》并推出V-JEPA模型(Video Joint Embedding Predictive Architectures)。不同于视频生成模型Sora,V-JEPA模型通过学习图像和视频的表示,主要用于预测视频缺失的部分或者被遮住的部分,目标是...
[27] Cheng Chi, Fangyun Wei, Han Hu. RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder. NeurIPS 2020 [28] Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond. ICCV workshop 201...