从(生产)语言模型中可扩展地提取训练数据

📄 中文摘要

研究发现,攻击者能通过简单且令人担忧的技巧,从流行人工智能模型中提取大量训练数据,即使是那些被认为安全的系统也未能幸免。通过精心设计的提示,攻击者可以诱导模型重复其训练集中的片段,导致聊天机器人意外泄露姓名、代码或其他敏感信息。这种数据泄露现象不仅发生在Pythia或GPT-Neo等开源模型中,也影响了LLaMA和Falcon等半开源模型,甚至包括一些闭源服务。一项新方法使ChatGPT不再像一个乐于助人的助手,而是以远超正常速度的方式吐出数据。这构成了切实的隐私风险:模型能够记忆并在训练过程中泄露所见内容,对个人和企业数据安全构成严重威胁。这种可扩展的数据提取能力凸显了当前大型语言模型在数据隐私保护方面的脆弱性,亟需开发更强大的防御机制以防止敏感信息泄露。

📄 English Summary

Scalable Extraction of Training Data from (Production) Language Models

Researchers have uncovered a straightforward yet alarming technique enabling attackers to extract substantial volumes of training data from popular AI models, including systems previously considered secure. By crafting specific prompts, attackers can induce these models to regurgitate segments of their training datasets, leading to chatbots inadvertently disclosing sensitive information such as names, code, or other private details. This vulnerability is not confined to open-source models like Pythia or GPT-Neo; it also extends to semi-open models such as LLaMA and Falcon, and notably, some closed-source services. A novel method has been demonstrated to make ChatGPT deviate from its helpful assistant persona, instead emitting data at a significantly accelerated rate compared to its normal operation. This capability presents a tangible privacy risk, as models can memorize and subsequently leak information they encountered during their training phase. The scalable nature of this data extraction highlights a critical weakness in current large language models regarding data privacy, necessitating the development of more robust defensive mechanisms to prevent the unauthorized disclosure of sensitive information and safeguard user and enterprise data.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等