超越硬约束：预算条件下的安全离线强化学习可达性

出处: Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

发布: 2026年3月25日

📄 中文摘要

顺序决策的马尔可夫决策过程在许多现实应用中起着基础作用。尽管基于模型和无模型的方法在这些设置中取得了显著成果，但现实任务必须在奖励最大化与安全约束之间取得平衡，这往往导致目标冲突，从而引发不稳定的最小/最大对抗优化。安全可达性分析作为一种有前景的替代方案，通过预计算一个前向不变的安全状态和动作集，确保从该集内开始的智能体能够无限期保持安全。然而，大多数基于可达性的方法仅关注硬安全约束，针对累积成本约束的研究相对较少。为了解决这一问题，首先定义了一种安全条件的可达性框架，旨在扩展现有的可达性分析方法，以涵盖预算条件下的安全性问题。

🏷️ 相关标签

#马尔可夫决策过程 #安全可达性分析 #强化学习 #累积成本约束

📄 English Summary

Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

Sequential decision-making using Markov Decision Processes (MDPs) is foundational in numerous real-world applications. Both model-based and model-free methods have demonstrated strong performance in these contexts. However, real-world tasks often necessitate a balance between reward maximization and safety constraints, which can lead to conflicting objectives and unstable min/max adversarial optimization. Safety reachability analysis emerges as a promising alternative, precomputing a forward-invariant safe state and action set that ensures an agent starting within this set remains safe indefinitely. Nevertheless, most reachability-based methods focus solely on hard safety constraints, with limited work extending reachability to cumulative cost constraints. To address this gap, a safety-conditioned reachability framework is defined, aiming to extend existing reachability analysis methods to encompass safety under budget conditions.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误