如何使用 FastAPI、pgvector 和 cross-encoder 重新排序构建生产级 RAG 管道

📄 中文摘要

构建了一个生产级 RAG 引擎,结合了混合搜索(pgvector + BM25)、cross-encoder 重新排序、多样性 MMR、语义缓存和自动语言检测,所有功能均基于 FastAPI 异步和 PostgreSQL。该系统解决了在生产环境中常见的问题,如西班牙语查询与英语文档匹配、相似度最高的多个块来自同一文档以及语义搜索在非英语文本中的失败。通过分享实际架构、技术决策和关键代码,提供了对该系统的深入理解。

📄 English Summary

Cómo construí un pipeline RAG de producción con FastAPI, pgvector y cross-encoder reranking

A production-grade RAG engine was built that combines hybrid search (pgvector + BM25), cross-encoder reranking, MMR diversity, semantic caching, and automatic language detection, all based on FastAPI async and PostgreSQL. The system addresses common issues encountered in production, such as Spanish queries matching English chunks, the top 10 similar chunks coming from the same document, and semantic search failures with non-English texts. By sharing the actual architecture, technical decisions, and key code, a deep understanding of the system is provided.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等