自我对弈仅在自我合成管道确保可学习信息增益时才能演化

出处: Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

发布: 2026年3月4日

📄 中文摘要

大型语言模型（LLMs）使得构建通过自我演化循环改进的系统成为可能，但许多现有的提案更应理解为自我对弈，且往往迅速达到瓶颈。一个核心的失败模式是循环合成更多数据而未能增加下一次迭代的可学习信息。通过在自我对弈编码任务上的实验，揭示了可持续的自我演化需要一个自我合成的数据管道，该管道在迭代中可学习信息不断增加。研究识别了自我演化的LLMs所扮演的三种角色：提议者（Proposer），生成任务；求解者（Solver），尝试解决方案；验证者（Verifier），提供训练信号，并识别了三种系统设计，这些设计共同针对可学习信息的增益。

🏷️ 相关标签

#自我对弈 #大型语言模型 #可学习信息 #自我合成 #系统设计

📄 English Summary

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Large language models (LLMs) enable the construction of systems that improve through self-evolving loops, yet many existing proposals are better characterized as self-play and often reach a plateau quickly. A central failure mode is the synthesis of more data without increasing learnable information for subsequent iterations. Experiments conducted on a self-play coding task reveal that sustainable self-evolution necessitates a self-synthesized data pipeline that ensures an increase in learnable information across iterations. The study identifies triadic roles played by self-evolving LLMs: the Proposer, which generates tasks; the Solver, which attempts solutions; and the Verifier, which provides training signals. Additionally, three system designs are identified that jointly target the enhancement of learnable information.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误