The RVL-CDIP dataset consists of scanned document images belonging to 16 classes such as letter, form, email, resume, memo, etc. The dataset has 320,000 training, 40,000 validation and 40,000 test images. The images are characterized by low quality, nois
RVL-CDIP_MP-N can serve its original goal as a covariate shift test set, now for multi-page document classification. We were able to retrieve the original full documents from DocumentCloud and Web Search. It has the same label taxonomy as RVL-CDIP (16)
RVL-CDIP(瑞尔森视觉实验室复杂文档信息处理)数据集由 16 类 400,000 张灰度图像组成,每类 25,000 张图像。有 320,000 张训练图像、40,000 张验证图像和 40,000 张测试图像。图像的大小使其最大尺寸不超过 1000 像素。 bnmvv5 10枚 CC0 CV分类教育 0 126 2021-07-16 ...
We find that models trained on the\nsmaller Tobacco-3482 dataset perform poorly on our new out-of-distribution\ndata, while text classification models trained on the larger RVL-CDIP exhibit\nsmaller performance drops.doi:10.48550/arXiv.2108.02684Stefan Larson...
按照https://aistudio.baidu.com/aistudio/datasetdetail/147611创建项目,看到 data 已经预先存放 4 个模型文件,然后按照README 安装好 requirements, 当要下载 RVL-CDIP文档图像分类 文件时,不成功,看样子是由于文件存放在google docs 的原因,这类存放在无法下载网址的文件,官方可否事先存放在 aistudio.baidu.com ...
We find that models trained on the smaller Tobacco-3482 dataset perform poorly on our new out-of-distribution data, while text classification models trained on the larger RVL-CDIP exhibit smaller performance drops. 展开 关键词: Computer Science - Computation and Language ...