COYO-700M: Large-scale Image-Text Pair Dataset. Contribute to kakaobrain/coyo-dataset development by creating an account on GitHub.
COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models. Our dataset follows a similar strategy to previous vision-and-language datasets, collecting many informative pairs of alt-text and its...