The Caltech Cars dataset consists of 126 rear-view photographs captured within parking lots. These images possess a resolution of 896 × 592 pixels, featuring a solitary vehicle as the primary subject. The acquisitions were made during daylight hours employing a handheld camera at roughly equivalent...
capture, we find that in comparison to humans, current computer vision models struggle to classify and detect the digits in our dataset. By comparing the performance of the latest task-specific models on CaltechFN and on an existing digit dataset, we show that our dataset indeed presents a ...
摘要: (CUB-200-2011) is an extended version of the , with roughly double the number of images per class and new part location annotations. For detailed information about the dataset, please see the technical report linked below.被引量: 2 ...