基于方差归一化梯度的动量自适应优化

📄 中文摘要

MVN-Grad(基于方差归一化梯度的动量)是一种改进稳定性和性能的Adam风格优化器,结合了两种互补的思想:基于方差的归一化和归一化后应用的动量。MVN-Grad通过对每个坐标进行梯度不确定性的指数移动平均缩放,并对结果进行归一化梯度施加动量,从而消除了标准Adam类型更新中陈旧动量与随机归一化器之间的跨时间耦合。研究证明,这种解耦在标准噪声假设下产生的单步条件更新方差严格小于动量-后-归一化方差方法,并且MVN-Grad对异常值具有鲁棒性,具有统一的界限。

📄 English Summary

Adaptive Optimization via Momentum on Variance-Normalized Gradients

MVN-Grad (Momentum on Variance-Normalized Gradients) is an Adam-style optimizer that enhances stability and performance by integrating two complementary concepts: variance-based normalization and momentum applied after normalization. MVN-Grad scales each coordinate by an exponential moving average of gradient uncertainty and applies momentum to the resulting normalized gradients, effectively eliminating the cross-time coupling between stale momentum and a stochastic normalizer found in standard Adam-type updates. This decoupling is proven to yield strictly smaller one-step conditional update variance than momentum-then-normalize variance methods under standard noise assumptions. Additionally, MVN-Grad demonstrates robustness to outliers, maintaining uniformly bounded performance.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等