基于因果 POMDP 的分布变化下的规划

📄 中文摘要

在现实世界中,规划常常面临分布变化的挑战。在一种条件下获得的环境模型可能在状态分布或环境动态变化时失效,从而导致先前学习的策略失败。提出了一种基于因果知识的部分可观测马尔可夫决策过程(POMDP)的理论框架,用于在部分可观测性下进行规划。通过将环境的变化表示为对该因果 POMDP 的干预,该框架能够评估在假设变化下的计划,并主动识别环境中被改变的组件。研究展示了如何维护和更新信念状态,以适应这些变化。

📄 English Summary

Planning under Distribution Shifts with Causal POMDPs

Planning in the real world is often challenged by distribution shifts, where a model of the environment obtained under one set of conditions may become invalid as the distribution of states or the dynamics of the environment change, leading to the failure of previously learned strategies. A theoretical framework for planning under partial observability using Partially Observable Markov Decision Processes (POMDPs) formulated with causal knowledge is proposed. By representing shifts in the environment as interventions on this causal POMDP, the framework allows for the evaluation of plans under hypothesized changes and actively identifies which components of the environment have been altered. The work demonstrates how to maintain and update belief states to adapt to these changes.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等