蓝图:重建遗产——复杂工程图纸和文档的多模态检索

📄 中文摘要

随着数十年的工程图纸和技术记录被锁定在遗留档案中,缺乏一致或完整的元数据使得检索变得困难且常常需要人工干预。提出了一种名为蓝图的布局感知多模态检索系统,旨在为大规模工程库提供解决方案。蓝图能够检测标准化的图纸区域,应用区域限制的基于视觉语言模型的光学字符识别,规范化标识符(如DWG、部件、设施),并结合词汇和密集检索,通过轻量级的区域级重排序器进行融合。该系统在约77万份未标记文件上部署,自动生成适合跨设施搜索的结构化元数据。蓝图在一个包含5000个文件的基准测试上进行了评估,使用350个专家策划的查询,并采用了分级(0/1/2)相关性判断。

📄 English Summary

BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents

The study presents Blueprint, a layout-aware multimodal retrieval system designed to address the challenges posed by decades of engineering drawings and technical records locked in legacy archives with inconsistent or missing metadata. Blueprint detects canonical drawing regions, applies region-restricted VLM-based OCR, normalizes identifiers (e.g., DWG, part, facility), and fuses lexical and dense retrieval with a lightweight region-level reranker. Deployed on approximately 770,000 unlabeled files, it automatically generates structured metadata suitable for cross-facility search. The effectiveness of Blueprint is evaluated on a benchmark of 5,000 files with 350 expert-curated queries, using pooled, graded (0/1/2) relevance judgments.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等