表7 ImageNet上不同无监督学习方法的线性探测精度,DALLE dVAE的图片quantizer用了额外的数据训练而成,而VIM-Large未使用dropout 表7中作者将模型分为两组:判别式预训练模型和生成式预训练模型,本文的方法VIM with ViT-VQGAN比其他生成式模型都好,而且参数量还更小,同时也能取得跟BYOL、DINO等判别式模型相近的性能...
一、Vector-Quantized Images with ViT-VQGAN 二、Vector-Quantized Image Modeling Experiment 一、重建 二、生成 三、无监督学习 论文地址:Vector-quantized Image Modeling with Improved VQGAN github:(不知道是不是官方的)GitHub - thuanz123/enhancing-transformers: An unofficial implementation of both ViT-VQGA...
第二阶段是Vector-quantized Image Modeling,用第一阶段的模型得到的32x32共1024个tokens,让Transformerautoregressively地预测下一个token。从而可以用来图像生成,如果是class-condition的生成,则会额外多加一个class token Figure 1. Overview of ViT-VQGAN (left) and Vector-quantized Image Modeling (right) for bo...
Pretraining language models with next-token prediction on massive text corpora has delivered phenomenal zero-shot, few-shot, transfer learning and multi-tasking capabilities on both generative and discriminative language tasks. Motivated by this success, we explore a Vector-quantized Image Modeling (VIM...
3. ViT-VQGAN《Vector-quantized Image Modeling with Improved VQGAN》 ICLR 2022 4. VQ-Diffusion《Vector Quantized Diffusion Model for Text-to-Image Synthesis》 CVPR 2022 5. Maskgit 《Maskgit: Masked generative image transformer》 CVPR 2022 6. Token-Critic 《Improved Masked Image Generation with ...
"Improved vector quantized diffusion models." arXiv preprint arXiv:2205.16007 (2022). ^Rombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. “High-resolution image synthesis with latent diffusion models.” In Proceedings of the IEEE/CVF Conference on Computer ...