通过世界-动作模型增强政策学习

出处: Enhancing Policy Learning with World-Action Model

发布: 2026年4月1日

📄 中文摘要

该研究提出了世界-动作模型（WAM），这是一种通过动作正则化的世界模型，能够共同推理未来的视觉观察和驱动状态转变的动作。与传统的仅通过图像预测训练的世界模型不同，WAM在DreamerV2中引入了逆动态目标，通过潜在状态转变预测动作，促使学习的表示捕捉对下游控制至关重要的动作相关结构。WAM在CALVIN基准的八个操作任务中评估了政策学习的增强效果。首先，通过行为克隆在世界模型潜在空间上预训练扩散政策，然后在冻结的世界模型内使用基于模型的PPO进行精炼。

🏷️ 相关标签

#世界-动作模型 #动作正则化 #潜在状态 #政策学习 #CALVIN基准

📄 English Summary

Enhancing Policy Learning with World-Action Model

The research presents the World-Action Model (WAM), an action-regularized world model that jointly reasons about future visual observations and the actions driving state transitions. Unlike conventional world models that are trained solely through image prediction, WAM incorporates an inverse dynamics objective into DreamerV2, predicting actions from latent state transitions. This encourages the learned representations to capture action-relevant structures critical for downstream control. WAM is evaluated for enhancing policy learning across eight manipulation tasks from the CALVIN benchmark. A diffusion policy is pretrained via behavioral cloning on world model latents, followed by refinement using model-based PPO within the frozen world model.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Enhancing Policy Learning with World-Action Model

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误