Gap-K%:测量Top-1预测差距用于检测预训练数据

📄 中文摘要

大型语言模型(LLMs)中大规模预训练语料库的不透明性带来了显著的隐私和版权问题,使得预训练数据检测成为一个关键挑战。现有最先进的方法通常依赖于token似然值,但它们往往忽略了模型top-1预测的偏差以及相邻token之间的局部相关性。针对这些局限性,提出了一种名为Gap-K%的新型预训练数据检测方法。该方法的核心思想是利用模型top-1预测与次优预测之间的差距,结合K%的局部上下文信息。具体来说,Gap-K%通过分析在给定K%上下文窗口内,模型对目标token的最高概率预测与第二高概率预测之间的差异来识别预训练数据。

📄 English Summary

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

The inherent opacity of massive pretraining corpora in Large Language Models (LLMs) poses significant privacy and copyright concerns, elevating pretraining data detection to a critical research challenge. Current state-of-the-art detection methodologies predominantly rely on token likelihoods, yet frequently overlook two crucial aspects: the divergence from the model's top-1 prediction and the localized correlation between adjacent tokens. To address these limitations, a novel pretraining data detection method, termed Gap-K%, is introduced. This approach fundamentally leverages the gap between the model's top-1 prediction and its subsequent best prediction, integrated with K% of local contextual information. Specifically, Gap-K% identifies pretraining data by analyzing the difference between the highest and second-highest probability predictions for a target token within a given K% context window. When an LLM has been trained on specific data, its prediction confidence for that data typically becomes anomalously high, leading to a significantly enlarged gap between the top-1 and top-2 prediction probabilities. By quantifying this 'prediction gap,' Gap-K% effectively captures the model's memorization traces of pretraining data. Concurrently, the incorporation of the K% local correlation concept ensures that detection is not solely based on individual token likelihoods, but also considers the local structural and semantic information of the text. This combined approach, integrating prediction gap analysis with local contextual understanding, enables Gap-K% to exhibit superior accuracy and robustness in determining whether a model has been exposed to a particular text segment, particularly when dealing with subtle memorization signals that might only manifest in minor secondary predictions.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等