Kaiming最近的一作论文,个人认为一句话总结评价的话,应该是——Simple and Effective pipeline for self-supervised learner in vision task. 本文目的在于探索masked autoencoder 在CV任务中做无监督预训练的方式,像BERT之于NLP task一样,使用masked autoencoder在原始数据中进行自监督预训练,并将得到的transformer encod...
ViT lacks the inductive bias inherent to convolution making it require a large amount of data for training. This results in ViT not performing as well as CNNs on small datasets like medicine and science. We experimentally found that masked autoencoders (MAE) can make the transformer focus more...
Paper tables with annotated results for Masked autoencoders are effective solution to transformer data-hungry
Masked Autoencoders are Efficient Class Incremental Learners Jiang-Tian Zhai 1 Xialei Liu 1,* Andrew D. Bagdanov 2 Ke Li 3 Ming-Ming Cheng 1 1 VCIP, CS, Nankai University 2 MICC, University of Florence 3 Tencent Youtu Lab Abstract Class Incremental Learning (CIL) ...
Masked Autoencoders Are Scalable Vision Learners Kaiming He∗,† Xinlei Chen∗ Saining Xie Yanghao Li Piotr Dolla´r Ross Girshick ∗equal technical contribution †project lead Facebook AI Research (FAIR) Abstract This paper shows that masked autoencoders (MAE) are scalable self-...
Global contrast-masked autoencoders are powerful pathological representation learnersPathological imageRepresentation learningSelf-supervised learning2024 Elsevier LtdUsing digital pathology slide scanning technology, artificial intelligence algorithms, particularly deep learning, have achieved significant results in the...
Masked autoencoders (MAEs) are a self-supervised pretraining strategy for vision transformers (ViTs) that masks-out patches in an input image and then predicts the missing regions. Although the approach is both simple and effective, the MAE ...
In both cases, the reconstruction results from the application of a trajectory masked autoencoder with factorized transformer architecture and \(l_S = 5\) and \(r = 0.8\). Even though large parts of the input data are masked, the model apparently learned to account for turns, twists and ...
MARLIN: Masked Autoencoder for facial video Representation LearnINg Zhixi Cai1, Shreya Ghosh1,2, Kalin Stefanov1, Abhinav Dhall1,3, Jianfei Cai1, Hamid Rezatofighi1, Reza Haffari1, Munawar Hayat1 1Monash University, 2 Curtin University, 3 Indian Institute of Techno...
Action RecognitionSomething-Something V2VideoMAE (no extra data, ViT-L, 32x2)Top-1 Accuracy75.4# 8 Compare Top-5 Accuracy95.2# 4 Compare Parameters305# 17 Compare GFLOPs1436x3# 7 Compare Action Recognition Something-Something V2 VideoMAE (no extra data, ViT-L, 16frame) ...