CTRL-RAG:基于对比似然奖励的强化学习用于上下文忠实的RAG模型

📄 中文摘要

随着检索增强生成(RAG)技术的广泛应用,训练大型语言模型(LLMs)以实现上下文敏感的推理和忠实性变得愈发重要。现有的RAG导向强化学习(RL)方法依赖于外部奖励,这些奖励往往无法有效评估文档的忠实性,并可能在开放域设置中错误判断相似答案。此外,缺乏基于RAG的自我奖励机制,虽然该机制在理论上能够根据文档估计答案的信心,但自我判断中缺乏客观反馈可能导致幻觉累积和模型崩溃。为了解决这些问题,提出了一种新颖的“内外”混合奖励框架,重点在于对比奖励的应用。

📄 English Summary

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

The growing use of Retrieval-Augmented Generation (RAG) highlights the importance of training large language models (LLMs) for context-sensitive reasoning and faithfulness. Existing RAG-oriented reinforcement learning (RL) methods often rely on external rewards that inadequately evaluate document faithfulness and may misjudge similar answers in open-domain scenarios. Additionally, there is a lack of a RAG-based self-reward mechanism. While such a mechanism could theoretically estimate answer confidence based on documents, the absence of objective feedback in self-assessment can lead to the accumulation of hallucinations and eventual model collapse. To address these challenges, a novel 'internal-external' hybrid reward framework centered on contrastive likelihood rewards is proposed.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等