DIVE:在代理任务合成中扩展多样性以实现可推广的工具使用

📄 中文摘要

针对后训练工具使用的大型语言模型(LLMs),最近的研究合成了代理任务,但在任务和工具集变化下的稳健泛化仍然是一个开放的挑战。研究发现,这种脆弱性源于合成任务的多样性不足。扩展多样性面临困难,因为训练要求任务保持可执行和可验证,而泛化则需要涵盖多样的工具类型、工具集组合和异构的工具使用模式。提出了DIVE,这是一种证据驱动的方案,通过先执行多样的真实世界工具,再严格推导出由结果轨迹所必然推导的任务,从而提供基于构造的基础。DIVE在两个可控轴上扩展结构多样性,工具...

📄 English Summary

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Recent advancements in synthesizing agentic tasks for post-training tool-using LLMs have highlighted the challenge of robust generalization amidst shifts in tasks and toolsets. The observed brittleness is attributed to insufficient diversity in the synthesized tasks. Scaling diversity proves challenging as training necessitates that tasks remain executable and verifiable, while generalization requires coverage of varied tool types, combinations, and heterogeneous usage patterns. DIVE is proposed as an evidence-driven framework that inverts the synthesis order by executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces, thereby ensuring grounding by construction. DIVE effectively scales structural diversity along two controllable axes, focusing on tool...

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等