生产中的 RAG 与教程大相径庭

出处: RAG in production is nothing like the tutorials

发布: 2026年2月25日

📄 中文摘要

许多 RAG 教程遵循相同的流程:将文档分割成块,生成嵌入,存储在向量数据库中,并在用户提问时检索前 K 个结果。然而,当将其部署到真实用户时,情况却大相径庭。经过几个月的构建和迭代,发现这些教程中的几乎所有假设在大规模应用时都不成立。实际有效的做法与教程中所述存在显著差异,本文将揭示这些关键点和遗漏之处。

📄 English Summary

RAG in production is nothing like the tutorials

Many RAG tutorials follow a similar script: take documents, split them into chunks, generate embeddings, store them in a vector database, and retrieve the top K results when a user asks a question. However, deploying this to real users reveals significant issues. After months of building and iterating on a production RAG pipeline, it became clear that nearly every assumption from those tutorials fails at scale. The article highlights what actually works and the critical omissions from the tutorials.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等