混合自演化结构记忆用于GUI智能体

📄 中文摘要

随着视觉-语言模型(VLMs)的显著进展,GUI智能体能够以类人方式与计算机互动。然而,现实世界中的计算机使用任务依然面临挑战,尤其是在长时间工作流、多样化界面和频繁的中间错误方面。以往的研究为智能体提供了基于大量轨迹集合的外部记忆,但仅依赖于对离散摘要或连续嵌入的平面检索,未能体现人类记忆的结构化组织和自演化特性。受大脑启发,提出了一种混合自演化结构记忆(HyMEM),这是一种基于图的记忆系统,将离散的高层次符号节点与连续的轨迹嵌入相结合。HyMEM维护了一种图结构,能够更有效地处理复杂的计算机使用任务。

📄 English Summary

Hybrid Self-evolving Structured Memory for GUI Agents

The remarkable progress of vision-language models (VLMs) has enabled GUI agents to interact with computers in a human-like manner. However, real-world computer-use tasks remain challenging due to long-horizon workflows, diverse interfaces, and frequent intermediate errors. Previous works equipped agents with external memory built from large collections of trajectories, relying on flat retrieval over discrete summaries or continuous embeddings, which fall short of the structured organization and self-evolving characteristics of human memory. Inspired by the brain, this research proposes Hybrid Self-evolving Structured Memory (HyMEM), a graph-based memory that couples discrete high-level symbolic nodes with continuous trajectory embeddings. HyMEM maintains a graph structure that can more effectively handle complex computer-use tasks.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等