RAG 安全性 — 攻击者如何毒化 AI 知识库及应对措施

📄 中文摘要

构建 RAG 系统以提高 AI 的准确性和上下文相关性时,通常会将文档、产品数据和知识库输入系统并进行准确性测试。然而,若攻击者在用户查询之前向知识库中注入恶意信息,可能会导致严重后果。根据 2025 年 USENIX Security 发布的研究,仅需五份精心设计的文档便可实现 90% 的攻击成功率。毒化 0.04% 的数据集可导致 98.2% 的攻击成功率和 74.6% 的系统失败。最危险的变种——幻影攻击,直到特定关键词被查询时才会激活,能够躲避标准监测方法。因此,知识库本身也成为了攻击面,许多构建 RAG 系统的团队未能对此给予足够重视。

📄 English Summary

RAG Security — How Attackers Poison AI Knowledge Bases and What to Do About It

Building a RAG system to enhance AI accuracy and contextual relevance often involves feeding it documentation, product data, and knowledge bases, followed by rigorous testing for accuracy. However, if attackers manage to inject malicious content into the knowledge base before users query it, severe consequences can ensue. Research published at USENIX Security 2025 indicates that just five carefully crafted documents can achieve a 90% attack success rate. Poisoning merely 0.04% of a corpus can result in a 98.2% attack success rate and a 74.6% system failure rate. The most dangerous variant, known as the Phantom attack, remains dormant until a specific keyword is queried, evading standard monitoring methods. Consequently, the knowledge base itself becomes an attack surface, a fact that many teams building RAG systems fail to adequately address.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等