基于偏好约束推断的安全强化学习

出处: Safe Reinforcement Learning with Preference-based Constraint Inference

发布: 2026年3月26日

📄 中文摘要

安全强化学习是一种用于安全关键决策的标准范式。然而，现实世界中的安全约束往往复杂、主观，甚至难以明确指定。现有的约束推断方法依赖于严格的假设或大量的专家示范，这在许多实际应用中并不现实。如何以低成本和高可靠性学习这些约束是本研究关注的主要挑战。通过从人类偏好中推断约束提供了一种数据高效的替代方案，但发现流行的Bradley-Terry模型未能捕捉安全成本的非对称性和重尾特性，导致风险低估。文献中对这一问题的理解仍然较为稀缺。

🏷️ 相关标签

#安全强化学习 #约束推断 #人类偏好 #风险估计

📄 English Summary

Safe Reinforcement Learning with Preference-based Constraint Inference

Safe reinforcement learning (RL) is a standard paradigm for safety-critical decision-making. However, real-world safety constraints can be complex, subjective, and difficult to specify explicitly. Existing works on constraint inference rely on restrictive assumptions or extensive expert demonstrations, which are often unrealistic in many real-world applications. The major challenge addressed in this study is how to learn these constraints cheaply and reliably. Inferring constraints from human preferences offers a data-efficient alternative, yet it has been identified that popular Bradley-Terry (BT) models fail to capture the asymmetric and heavy-tailed nature of safety costs, leading to risk underestimation. Understanding this issue remains rare in the literature.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Safe Reinforcement Learning with Preference-based Constraint Inference

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误