光学字符识别评估方法与指标的调查及历史文献的不可见性
📄 中文摘要
光学字符识别(OCR)和文档理解系统越来越依赖于大型视觉和视觉-语言模型,但评估仍然集中在现代、西方和机构文档上。这种偏重掩盖了历史和边缘档案中的系统行为,在这些档案中,布局、排版和材料退化影响着解读。研究考察了OCR和文档理解系统的评估方式,特别关注黑人历史报纸。通过PRISMA框架,回顾了2006年至2025年间发布的OCR和文档理解论文以及基准数据集,分析了这些研究如何报告训练数据、基准设计和评估指标。该研究旨在揭示当前评估方法的局限性,并呼吁对历史文献的更多关注。
📄 English Summary
A Survey of OCR Evaluation Methods and Metrics and the Invisibility of Historical Documents
Optical character recognition (OCR) and document understanding systems increasingly rely on large vision and vision-language models, yet evaluation remains focused on modern, Western, and institutional documents. This emphasis obscures system behavior in historical and marginalized archives, where layout, typography, and material degradation influence interpretation. The study examines how OCR and document understanding systems are evaluated, with a particular focus on Black historical newspapers. Using the PRISMA framework, it reviews OCR and document understanding papers and benchmark datasets published between 2006 and 2025, analyzing how these studies report training data, benchmark design, and evaluation metrics. The research aims to highlight the limitations of current evaluation methods and calls for greater attention to historical documents.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等