最大化用户上下文与响应之间的互信息以提升大型语言模型个性化,无需额外数据
📄 中文摘要
提出了一种新的自我改进框架——互信息偏好优化(MIPO),旨在提升大型语言模型(LLMs)的个性化能力,而无需依赖外部监督或人类标注的数据。MIPO通过对比数据增强方法构建偏好对,生成一个基于正确提示的正响应和一个基于随机无关提示的负响应。该方法有效利用现有数据,避免了高质量新数据收集的高成本,同时超越了传统可验证任务的局限性。实验结果表明,MIPO在多个领域中显著提升了模型的个性化表现,展示了自我改进的潜力。通过这一方法,LLMs能够在没有额外数据的情况下,优化其响应质量和用户适应性。
📄 English Summary
Maximizing mutual information between user-contexts and responses improve LLM personalization with no additional data
A novel self-improvement framework, Mutual Information Preference Optimization (MIPO), is proposed to enhance the personalization of large language models (LLMs) without relying on external supervision or human-labeled data. MIPO utilizes a contrastive data augmentation method to construct preference pairs, generating a positive response based on the correct prompt and a negative response based on a random, unrelated prompt. This approach effectively leverages existing data, circumventing the high costs associated with collecting new high-quality data, while transcending the limitations of traditional verifiable tasks. Experimental results demonstrate that MIPO significantly improves the personalization performance of models across various domains, showcasing the potential for self-improvement. Through this method, LLMs can optimize response quality and user adaptability without additional data.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等