通过修正时机增强目标推断

出处: Enhancing Goal Inference via Correction Timing

发布: 2026年2月24日

📄 中文摘要

修正提供了一种自然的方式,让人们能够向机器人提供反馈。当人们认为机器人在执行任务目标时失败或即将失败时,可以通过干预机器人的行为来进行修正。同时,修正还可以修改机器人的行为,以成功完成任务。每个修正都提供了关于机器人应做和不应做的行为的信息,其中修正后的行为与任务目标相比,更加一致。以往关于从修正中学习的研究通常将修正视为新的示范(包括修改后的机器人行为)或偏好(对比机器人原始行为的修改轨迹)。然而,这种方法忽视了修正反馈的一个重要元素。

📄 English Summary

Enhancing Goal Inference via Correction Timing

Corrections provide a natural modality for humans to give feedback to robots by intervening in the robot's behavior when they believe the robot is failing or will fail to meet task objectives, and by modifying the robot's behavior to successfully fulfill the task. Each correction conveys information about what the robot should and should not do, with the corrected behavior being more aligned with task objectives than the original behavior. Most prior work on learning from corrections interprets a correction as a new demonstration (consisting of the modified robot behavior) or a preference (for the modified trajectory compared to the robot's original behavior). However, this overlooks an essential element of correction feedback.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等