基于语言的层次化奖励设计:增强代理行为与人类规范的一致性

📄 中文摘要

在人工智能(AI)任务训练中,人类不仅关注任务是否完成,还关心任务的执行方式。随着AI代理应对日益复杂的任务,确保其行为与人类提供的规范一致变得至关重要。奖励设计为这种一致性提供了直接的渠道,通过将人类期望转化为指导强化学习(RL)的奖励函数。然而,现有方法往往无法充分捕捉在长期任务中出现的细微人类偏好。因此,提出了基于语言的层次化奖励设计(HRDL),这一问题表述扩展了经典奖励设计,以编码更丰富的行为规范。

📄 English Summary

Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

When training artificial intelligence (AI) to perform tasks, humans care not only about task completion but also about the manner of execution. As AI agents face increasingly complex tasks, aligning their behavior with human specifications becomes crucial for responsible AI deployment. Reward design serves as a direct channel for such alignment by translating human expectations into reward functions that guide reinforcement learning (RL). However, existing methods often fall short in capturing the nuanced human preferences that arise in long-horizon tasks. Therefore, Hierarchical Reward Design from Language (HRDL) is introduced as a problem formulation that extends classical reward design to encode richer behavioral specifications.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等