V2L Tokenizer采用编码器-量化器-解码器结构。一共使用两个量化器: 一个局部量化器和一个全局量化器。每个量化器都与一个独立的、来自LLM词汇表的冻结codebook相关联。然后, 图像被量化为 K_g 个全局token和 K_l 个局部token, 分别从全局和局部codebook中提取。 全局codebook。LLM词汇表包括由语言Tokenizer生成的...
Run "step4_training_v2l_tokenizer.py" to train the V2L Tokenizer based on the codebook produced by the above 3 steps. We also provided our codebooks and checkpoints at:https://drive.google.com/drive/folders/1Z8GxE-WMEijJV-JZmqL7AGzsB0gHk4ow?usp=sharing ...