Beta调度:来自临界阻尼的动量作为神经网络训练的诊断和修正工具

📄 中文摘要

标准的神经网络训练使用恒定的动量(通常为0.9),这一惯例自1964年以来延续,但其最优性理论依据有限。研究推导出了一种基于临界阻尼谐振子的时间变化动量调度:mu(t) = 1 - 2*sqrt(alpha(t)),其中alpha(t)为当前学习率。该beta调度在现有学习率调度的基础上不需要额外的自由参数。在ResNet-18/CIFAR-10上,beta调度相比恒定动量实现了1.9倍更快的收敛速度,达到90%的准确率。更重要的是,在该调度下的每层梯度归因产生了一种跨优化器不变的诊断:无论模型是使用何种优化器训练,均能识别出相同的三个问题层。

📄 English Summary

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

Standard neural network training employs a constant momentum (typically 0.9), a convention that dates back to 1964 with limited theoretical justification for its optimality. A time-varying momentum schedule is derived from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires no additional free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling achieves 1.9 times faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of the optimizer used for training.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等