PMC-OA is a large-scale dataset that contains 1.65M image-text pairs. The figures and captions from PubMed Central, 2,478,267 available papers are covered and 12,211,907 figure-caption pairs are extracted.