GLID: Pre-training a Generalist Encoder-Decoder Vision Model
This minimizes the pretrain-finetune architecture inconsistency and enables the pre-trained model to better adapt to downstream tasks. GLID achieves competitive performance on various vision tasks, including ob