HEAL:基于回顾熵辅助学习的推理蒸馏

📄 中文摘要

提出了一种新的框架——回顾熵辅助学习(HEAL),旨在克服从大型推理模型(LRMs)向小型模型蒸馏推理能力时遇到的限制。传统方法将教师视为静态过滤器,忽视了教师在复杂“边缘案例”问题上的探索能力,从而人为地为学生设定了“教师天花板”。HEAL通过结合教育理论中的最近发展区(ZPD),整合了三个核心模块:引导熵辅助修复(GEAR),该模块通过主动干预机制检测关键推理断点,提升学生模型的学习效果。该方法不依赖于强化学习,旨在缩小推理能力的差距。通过这种方式,HEAL为推理蒸馏提供了新的思路和方法。

📄 English Summary

HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation

A novel framework called Hindsight Entropy-Assisted Learning (HEAL) is proposed to address the limitations encountered when distilling reasoning capabilities from Large Reasoning Models (LRMs) into smaller models. Traditional methods treat the teacher as a static filter, neglecting the exploration of complex 'corner-case' problems where the teacher fails to independently identify valid solutions, thus artificially creating a 'Teacher Ceiling' for the student. HEAL synergizes three core modules, drawing from the educational theory of the Zone of Proximal Development (ZPD): Guided Entropy-Assisted Repair (GEAR), an active intervention mechanism that detects critical reasoning breakpoints to enhance the learning effectiveness of the student model. This approach is RL-free and aims to bridge the reasoning capability gap, providing new insights and methods for reasoning distillation.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等