DeepRead: 文档结构感知推理以增强智能体搜索
📄 中文摘要
随着工具使用和智能体大型语言模型(LLM)的快速发展,检索增强生成(RAG)正从一次性、被动检索演变为多轮、决策驱动的证据获取。尽管在开放域设置中取得了显著成果,但现有智能体搜索框架通常将长文档视为扁平的块集合,未能充分利用文档固有的先验知识,例如层级组织和顺序语篇结构。DeepRead 是一种结构感知的多轮文档推理智能体,它明确地将这些先验知识应用于长文档问答。DeepRead 利用基于 LLM 的 OCR 模型将 PDF 转换为结构化 Markdown,从而保留了标题和段落边界,为智能体提供了更丰富的上下文信息。通过整合文档的层级和顺序结构,DeepRead 能够进行更精细的证据检索和推理,显著提升了在复杂长文档问答任务中的性能和准确性。这种方法克服了传统 RAG 框架在处理长文档时信息丢失和上下文不足的局限性,为智能体在复杂信息环境中进行高效、准确的知
📄 English Summary
DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search
The rapid advancement of tool-using and agentic large language models (LLMs) is transforming Retrieval-Augmented Generation (RAG) from one-shot, passive retrieval into multi-turn, decision-driven evidence acquisition. While demonstrating strong performance in open-domain scenarios, current agentic search frameworks frequently treat lengthy documents as flat collections of chunks, thereby underutilizing inherent document priors such as hierarchical organization and sequential discourse structure. DeepRead introduces a structure-aware, multi-turn document reasoning agent specifically designed to operationalize these priors for long-document question answering. DeepRead employs an LLM-based OCR model to convert PDFs into structured Markdown, meticulously preserving headings and paragraph boundaries. This structured representation provides agents with richer contextual information, enabling more nuanced evidence retrieval and reasoning. By integrating both hierarchical and sequential document structures, DeepRead significantly enhances performance and accuracy in complex long-document question answering tasks. This innovative approach addresses the limitations of traditional RAG frameworks, which often suffer from information loss and insufficient context when processing extensive documents. DeepRead paves a new path for agents to achieve efficient and precise knowledge acquisition within intricate information environments, demonstrating a substantial leap in document understanding and reasoning capabilities for AI agents.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等