通过追踪重写保护语言模型免受未授权蒸馏

📄 中文摘要

知识蒸馏是一种广泛采用的技术,用于将大型语言模型(LLM)的能力转移到更小、更高效的学生模型。然而,未授权的知识蒸馏利用了开发前沿模型所投入的巨大努力和成本。研究提出了修改教师生成的推理追踪的方法,以实现两个目标:一是反蒸馏,降低查询响应的训练有效性;二是API水印,在学生模型中嵌入可验证的签名。多种动态重写教师推理输出的方法被引入,确保答案的正确性和语义的一致性得以保留。

📄 English Summary

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Knowledge distillation is a widely adopted technique for transferring capabilities from large language models (LLMs) to smaller, more efficient student models. However, unauthorized knowledge distillation exploits the significant effort and cost invested in developing cutting-edge models. This research proposes methods for modifying teacher-generated reasoning traces to achieve two objectives: (1) anti-distillation, which degrades the training usefulness of query responses, and (2) API watermarking, which embeds verifiable signatures in student models. Several approaches for dynamically rewriting a teacher's reasoning outputs are introduced, ensuring that answer correctness and semantic coherence are preserved.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等