RAG与长上下文:如何向大型语言模型提供私有数据?

📄 中文摘要

大型语言模型(LLMs)在训练截止日期之前的知识是固定的,无法直接访问内部文档,除非在查询时注入相关上下文。两种工程模式相互竞争:一种是检索增强生成(RAG),其过程为:将文档分块、嵌入、存储向量、检索最佳匹配并将片段注入提示中;另一种是长上下文(粗暴方法),直接将大型文档放入模型的上下文窗口,让注意力机制找到答案。这种架构选择影响复杂性、可靠性、成本和正确性。选择错误的模式可能导致(a)因检索失败而遗漏事实,或(b)超出预算且仍然得到噪声结果,因为模型无法在信息海洋中聚焦于关键内容。

📄 English Summary

RAG vs Long-Context: how should you give LLMs your private data?

Large Language Models (LLMs) are fixed in time, knowing only what they were trained on up to their cutoff date and lacking access to internal documents unless context is injected at query time. Two engineering patterns compete: Retrieval-Augmented Generation (RAG), which involves chunking documents, embedding, storing vectors, retrieving top matches, and injecting snippets into prompts; and Long-context (brute force), which dumps large documents directly into the model's context window, allowing attention to find answers. This architectural choice impacts complexity, reliability, cost, and correctness. Choosing the wrong pattern can lead to either (a) missing facts due to retrieval failures or (b) exceeding budget while still obtaining noisy results because the model cannot focus on the critical information amidst the clutter.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等