扩展逻辑的扩展:代理元合成的逻辑推理

📄 中文摘要

可验证训练信号的扩展仍然是来自可验证奖励的强化学习(RLVR)的关键瓶颈。逻辑推理作为一种自然的基础,约束条件是形式化的,答案是可程序化检查的。然而,以往的合成管道要么依赖专家编写的代码,要么在固定的模板/框架内操作,这在很大程度上限制了增长仅限于实例级的扰动。提出了SSLogic,这是一种代理元合成框架,通过在闭合的生成-验证-修复循环中迭代合成和修复可执行的生成器-验证器程序对,在任务家族级别上进行扩展,能够实现可控难度的持续家族演化。为了确保可靠性,引入了多门控验证机制。

📄 English Summary

Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning

Scaling verifiable training signals remains a critical bottleneck for Reinforcement Learning from Verifiable Rewards (RLVR). Logical reasoning serves as a natural substrate where constraints are formal and answers are programmatically checkable. Previous synthesis pipelines either rely on expert-written code or operate within fixed templates, limiting growth primarily to instance-level perturbations. SSLogic is proposed as an agentic meta-synthesis framework that scales at the task-family level by iteratively synthesizing and repairing executable Generator-Validator program pairs in a closed Generate-Validate-Repair loop. This enables continuous family evolution with controllable difficulty. To ensure reliability, a Multi-Gate Validation mechanism is introduced.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等