Beta调度：来自临界阻尼的动量作为神经网络训练的诊断和修正工具

出处: Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

发布: 2026年4月1日

📄 中文摘要

标准的神经网络训练使用恒定的动量（通常为0.9），这一惯例自1964年以来延续，但其最优性理论依据有限。研究推导出了一种基于临界阻尼谐振子的时间变化动量调度：mu(t) = 1 - 2*sqrt(alpha(t))，其中alpha(t)为当前学习率。该beta调度在现有学习率调度的基础上不需要额外的自由参数。在ResNet-18/CIFAR-10上，beta调度相比恒定动量实现了1.9倍更快的收敛速度，达到90%的准确率。更重要的是，在该调度下的每层梯度归因产生了一种跨优化器不变的诊断：无论模型是使用何种优化器训练，均能识别出相同的三个问题层。

🏷️ 相关标签

#动量调度 #临界阻尼 #神经网络训练 #梯度归因 #收敛速度

📄 English Summary

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

Standard neural network training employs a constant momentum (typically 0.9), a convention that dates back to 1964 with limited theoretical justification for its optimality. A time-varying momentum schedule is derived from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires no additional free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling achieves 1.9 times faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of the optimizer used for training.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误