AlpsBench：一种用于真实对话记忆和偏好对齐的 LLM 个性化基准

出处: AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

发布: 2026年3月31日

📄 中文摘要

随着大型语言模型（LLMs）逐渐演变为终身人工智能助手，LLM 个性化已成为一个关键前沿。然而，当前的进展受到缺乏黄金标准评估基准的制约。现有基准要么忽视了个性化所需的个性化信息管理，要么过于依赖合成对话，这与真实世界对话存在固有的分布差距。为了解决这一问题，提出了 AlpsBench，这是一个基于真实人类与 LLM 对话的个性化基准。AlpsBench 包含从 WildChat 精心策划的 2500 个长期互动序列，并配有经过人工验证的结构化记忆，涵盖了显性和隐性个性化信号。

🏷️ 相关标签

#大型语言模型 #个性化 #基准测试 #真实对话 #记忆管理

📄 English Summary

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has emerged as a critical frontier. However, progress is currently hindered by the lack of a gold-standard evaluation benchmark. Existing benchmarks either overlook the essential personalized information management necessary for effective personalization or rely heavily on synthetic dialogues, which exhibit a significant distribution gap from real-world conversations. To address this issue, AlpsBench is introduced as an LLM personalization benchmark derived from authentic human-LLM dialogues. AlpsBench consists of 2,500 long-term interaction sequences curated from WildChat, accompanied by human-verified structured memories that encapsulate both explicit and implicit personalization signals.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误