环境地图:面向长时间跨度智能体的结构化环境表示

📄 中文摘要

在复杂软件工作流的自动化中,尽管大型语言模型(LLMs)取得了快速进展,但仍然面临许多挑战。特别是在长时间跨度的任务中,智能体常常遭遇级联错误和环境随机性,动态界面中的一次失误可能导致任务失败,进而引发幻觉或试错行为。研究提出了一种名为“环境地图”的持久性、智能体无关的表示方法,通过将异构证据(如屏幕录制和执行轨迹)整合为结构化图,来缓解这些问题。该表示由四个核心组件构成:1)上下文(抽象位置),2)动作(参数化可用性),3)工作流(观察到的轨迹)和4)环境状态(动态变化的环境信息)。

📄 English Summary

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

The automation of complex software workflows remains a significant challenge despite the rapid advancements in large language models (LLMs). In long-horizon scenarios, agents often experience cascading errors and environmental stochasticity; a single misstep in a dynamic interface can lead to task failure, resulting in hallucinations or trial-and-error. This research introduces 'Environment Maps', a persistent, agent-agnostic representation that mitigates these failures by consolidating heterogeneous evidence, such as screen recordings and execution traces, into a structured graph. The representation consists of four core components: (1) Contexts (abstracted locations), (2) Actions (parameterized affordances), (3) Workflows (observed trajectories), and (4) Environmental States (dynamic environmental information).

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等