📄 中文摘要
顺序决策的马尔可夫决策过程在许多现实应用中起着基础作用。尽管基于模型和无模型的方法在这些设置中取得了显著成果,但现实任务必须在奖励最大化与安全约束之间取得平衡,这往往导致目标冲突,从而引发不稳定的最小/最大对抗优化。安全可达性分析作为一种有前景的替代方案,通过预计算一个前向不变的安全状态和动作集,确保从该集内开始的智能体能够无限期保持安全。然而,大多数基于可达性的方法仅关注硬安全约束,针对累积成本约束的研究相对较少。为了解决这一问题,首先定义了一种安全条件的可达性框架,旨在扩展现有的可达性分析方法,以涵盖预算条件下的安全性问题。
📄 English Summary
Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
Sequential decision-making using Markov Decision Processes (MDPs) is foundational in numerous real-world applications. Both model-based and model-free methods have demonstrated strong performance in these contexts. However, real-world tasks often necessitate a balance between reward maximization and safety constraints, which can lead to conflicting objectives and unstable min/max adversarial optimization. Safety reachability analysis emerges as a promising alternative, precomputing a forward-invariant safe state and action set that ensures an agent starting within this set remains safe indefinitely. Nevertheless, most reachability-based methods focus solely on hard safety constraints, with limited work extending reachability to cumulative cost constraints. To address this gap, a safety-conditioned reachability framework is defined, aiming to extend existing reachability analysis methods to encompass safety under budget conditions.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等