世界不会静止:用于智能体基准的可编程演化

📄 中文摘要

LLM驱动的智能体通过与环境交互、查询数据和调用工具来满足用户请求,通常采用多轮过程。然而,大多数现有基准假设环境是静态的,具有固定的模式和工具集,忽视了现实世界的演变特性以及智能体对环境变化的适应能力。为了解决这一关键问题,提出了ProEvolve,一个基于图的框架,使环境演化可编程化。该框架的核心是一个类型化的关系图,为环境提供了统一、明确的表示,包括数据、工具和模式。在这一形式下,可以更好地评估智能体对现实世界动态的适应能力。

📄 English Summary

The World Won't Stay Still: Programmable Evolution for Agent Benchmarks

LLM-powered agents fulfill user requests through multi-turn processes involving interactions with environments, data querying, and tool invocation. Most existing benchmarks, however, assume static environments with fixed schemas and toolsets, overlooking the evolutionary nature of the real world and the adaptability of agents to environmental changes. This study addresses a crucial issue: how to evolve agent environments in a scalable and controllable manner to better evaluate agents' adaptability to real-world dynamics. ProEvolve is proposed as a graph-based framework that enables programmable environment evolution. At its core, a typed relational graph provides a unified and explicit representation of the environment, encompassing data, tools, and schemas. This formalism facilitates a more accurate assessment of agents' responsiveness to dynamic conditions.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等