行动-观察-重写：多模态编码代理作为上下文策略学习者用于机器人操控

出处: Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation

发布: 2026年3月6日

📄 中文摘要

提出了一种名为行动-观察-重写（AOR）的框架，旨在使多模态语言模型能够通过推理自身的失败来学习操控物理对象，而无需梯度更新、演示或奖励工程。在这一框架中，LLM代理通过在试验之间合成全新的可执行Python控制器代码来改进机器人操控策略，受视觉观察和结构化情节结果的指导。与以往将LLM置于预定义技能库中或使用代码生成进行一次性计划合成的工作不同，AOR将低级运动控制实现作为LLM推理的单位，使代理不仅能够改变机器人执行的任务，还能优化其控制策略。

🏷️ 相关标签

#多模态语言模型 #机器人操控 #行动-观察-重写 #策略学习 #代码生成

📄 English Summary

Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation

The Act-Observe-Rewrite (AOR) framework is proposed to enable multimodal language models to learn to manipulate physical objects by reasoning about their own failures, without requiring gradient updates, demonstrations, or reward engineering. In this framework, an LLM agent improves a robot manipulation policy by synthesizing entirely new executable Python controller code between trials, guided by visual observations and structured episode outcomes. Unlike prior works that ground LLMs in predefined skill libraries or use code generation for one-shot plan synthesis, AOR makes the full low-level motor control implementation the unit of LLM reasoning, allowing the agent to not only change the tasks the robot performs but also optimize its control strategies.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误