📄 中文摘要
该研究提出了世界-动作模型(WAM),这是一种通过动作正则化的世界模型,能够共同推理未来的视觉观察和驱动状态转变的动作。与传统的仅通过图像预测训练的世界模型不同,WAM在DreamerV2中引入了逆动态目标,通过潜在状态转变预测动作,促使学习的表示捕捉对下游控制至关重要的动作相关结构。WAM在CALVIN基准的八个操作任务中评估了政策学习的增强效果。首先,通过行为克隆在世界模型潜在空间上预训练扩散政策,然后在冻结的世界模型内使用基于模型的PPO进行精炼。
📄 English Summary
Enhancing Policy Learning with World-Action Model
The research presents the World-Action Model (WAM), an action-regularized world model that jointly reasons about future visual observations and the actions driving state transitions. Unlike conventional world models that are trained solely through image prediction, WAM incorporates an inverse dynamics objective into DreamerV2, predicting actions from latent state transitions. This encourages the learned representations to capture action-relevant structures critical for downstream control. WAM is evaluated for enhancing policy learning across eight manipulation tasks from the CALVIN benchmark. A diffusion policy is pretrained via behavioral cloning on world model latents, followed by refinement using model-based PPO within the frozen world model.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等