DIVE：在代理任务合成中扩展多样性以实现可推广的工具使用

出处: DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

发布: 2026年3月13日

📄 中文摘要

针对后训练工具使用的大型语言模型（LLMs），最近的研究合成了代理任务，但在任务和工具集变化下的稳健泛化仍然是一个开放的挑战。研究发现，这种脆弱性源于合成任务的多样性不足。扩展多样性面临困难，因为训练要求任务保持可执行和可验证，而泛化则需要涵盖多样的工具类型、工具集组合和异构的工具使用模式。提出了DIVE，这是一种证据驱动的方案，通过先执行多样的真实世界工具，再严格推导出由结果轨迹所必然推导的任务，从而提供基于构造的基础。DIVE在两个可控轴上扩展结构多样性，工具...

🏷️ 相关标签

#多样性 #代理任务 #工具使用 #合成 #泛化

📄 English Summary

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Recent advancements in synthesizing agentic tasks for post-training tool-using LLMs have highlighted the challenge of robust generalization amidst shifts in tasks and toolsets. The observed brittleness is attributed to insufficient diversity in the synthesized tasks. Scaling diversity proves challenging as training necessitates that tasks remain executable and verifiable, while generalization requires coverage of varied tool types, combinations, and heterogeneous usage patterns. DIVE is proposed as an evidence-driven framework that inverts the synthesis order by executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces, thereby ensuring grounding by construction. DIVE effectively scales structural diversity along two controllable axes, focusing on tool...

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误