To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-...
Section 2 gives an overview of the recent literature on deep gen- erative models, image encoders, and diagram-based tasks and datasets. Section 3 describes Paper2Fig100k, a novel dataset of research figures and texts. In Section 4 we pro- pose OCR-VQGAN, an image encoder focused in...
TheICDAR 2019 SROIE data setis used which contains 1000 whole scanned receipt images. Each receipt image contains around about four key text fields, such as goods name, unit price, date, and total cost. The text annotated in the dataset mainly consists of digits and English characters. The s...
2 Literature Review This paper, to the best of the authors' knowledge, is the first work in Arabic printed text OCR investigating a novel way to extract word features in the Block-based DCT (BDCT) domain. This is based on using a Discrete one-dimensional Hidden Markov (Bakis) Model (1D...
because it is not that character. This causes training and inference mistakes for the model. The solution is to identify these homoglyphs and change them all to the selected character. For Latin characters, dictionaries solve the problem, but for Japanese or Chinese literature, homoglyphs require...