重训练是否足够？高效 MoE 压缩的路由器校准必要性

出处: Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

发布: 2026年3月4日

📄 中文摘要

混合专家（MoE）模型有效地扩展了容量，但其庞大的参数占用在部署时造成了内存瓶颈。将无重训练的 MoE 压缩组织为三种范式——专家剪枝、专家编辑和专家合并，研究表明，持续的后压缩性能下降主要源于一个被忽视的因素：当专家发生变化而路由器保持不变时，路由器与专家之间的不匹配。有效的无重训练压缩应避免更新专家参数，同时允许轻量级的路由器校准。为此，提出了路由器知识蒸馏（Router KD），该方法仅更新少量参数（路由器），通过蒸馏原始模型的下一个标记信息来实现。

🏷️ 相关标签

#混合专家 #无重训练 #路由器校准 #知识蒸馏 #模型压缩

📄 English Summary

Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

Mixture-of-Experts (MoE) models efficiently scale capacity, but their large parameter footprint creates a memory bottleneck at deployment time. This work organizes retraining-free MoE compression into three paradigms: Expert Pruning, Expert Editing, and Expert Merging. It demonstrates that persistent post-compression degradation largely arises from a neglected factor: router-expert mismatch when experts are altered while the router remains unchanged. Effective retraining-free compression should avoid updating expert parameters while allowing for lightweight router calibration. To address this, Router Knowledge Distillation (Router KD) is proposed, which updates only a small fraction of parameters (the router) by distilling the next-token information from the original model.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误