行动-观察-重写:多模态编码代理作为上下文策略学习者用于机器人操控
📄 中文摘要
提出了一种名为行动-观察-重写(AOR)的框架,旨在使多模态语言模型能够通过推理自身的失败来学习操控物理对象,而无需梯度更新、演示或奖励工程。在这一框架中,LLM代理通过在试验之间合成全新的可执行Python控制器代码来改进机器人操控策略,受视觉观察和结构化情节结果的指导。与以往将LLM置于预定义技能库中或使用代码生成进行一次性计划合成的工作不同,AOR将低级运动控制实现作为LLM推理的单位,使代理不仅能够改变机器人执行的任务,还能优化其控制策略。
📄 English Summary
Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation
The Act-Observe-Rewrite (AOR) framework is proposed to enable multimodal language models to learn to manipulate physical objects by reasoning about their own failures, without requiring gradient updates, demonstrations, or reward engineering. In this framework, an LLM agent improves a robot manipulation policy by synthesizing entirely new executable Python controller code between trials, guided by visual observations and structured episode outcomes. Unlike prior works that ground LLMs in predefined skill libraries or use code generation for one-shot plan synthesis, AOR makes the full low-level motor control implementation the unit of LLM reasoning, allowing the agent to not only change the tasks the robot performs but also optimize its control strategies.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等