📄 中文摘要
大型语言模型(LLMs)在复杂推理任务中表现出色,但其内部工作机制通常不透明,导致难以理解和信任其决策过程。本文提出了一种新颖的“可监控性”概念,旨在通过模型自身生成可解释的中间步骤来揭示其推理过程。我们引入了“基于检索的验证和细化”(RLVR)框架,该框架通过迭代地检索相关信息并细化其推理路径,从而在无需额外监督的情况下,自发地生成这些可监控的中间步骤。实验结果表明,RLVR不仅显著提高了LLMs在多跳问答、数学推理和事实核查等任务上的性能,而且其生成的中间步骤具有高度的可解释性和可验证性。这些步骤允许人类用户轻松地追踪和评估模型的推理过程,从而增强了对模型输出的信任。RLVR的这一特性使其成为一种“免费馈赠”,因为它在提升性能的同时,也自然而然地提供了对模型内部推理的洞察,为构建更透
📄 English Summary
Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning
Large Language Models (LLMs) excel at complex reasoning tasks, yet their internal mechanisms often remain opaque, hindering understanding and trust in their decision-making. This paper introduces a novel concept of "monitorability," aiming to reveal LLM reasoning processes through self-generated, interpretable intermediate steps. We propose the "Retrieval-based Verification and Refinement" (RLVR) framework, which spontaneously generates these monitorable steps without additional supervision by iteratively retrieving relevant information and refining its reasoning path. Experimental results demonstrate that RLVR not only significantly enhances LLM performance across tasks like multi-hop question answering, mathematical reasoning, and fact verification, but also produces highly interpretable and verifiable intermediate steps. These steps allow human users to easily trace and evaluate the model's reasoning process, thereby increasing trust in its outputs. RLVR's unique characteristic makes it a "free gift," as it naturally provides insights into the model's internal reasoning while simultaneously boosting performance, offering a new avenue for building more transparent and reliable AI systems. This work underscores the importance of designing self-explanatory AI systems and lays a foundation for the future development of trustworthy AI.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等