基于方差归一化梯度的动量自适应优化

出处: Adaptive Optimization via Momentum on Variance-Normalized Gradients

发布: 2026年2月12日

📄 中文摘要

MVN-Grad（基于方差归一化梯度的动量）是一种改进稳定性和性能的Adam风格优化器，结合了两种互补的思想：基于方差的归一化和归一化后应用的动量。MVN-Grad通过对每个坐标进行梯度不确定性的指数移动平均缩放，并对结果进行归一化梯度施加动量，从而消除了标准Adam类型更新中陈旧动量与随机归一化器之间的跨时间耦合。研究证明，这种解耦在标准噪声假设下产生的单步条件更新方差严格小于动量-后-归一化方差方法，并且MVN-Grad对异常值具有鲁棒性，具有统一的界限。

🏷️ 相关标签

#动量优化 #方差归一化 #自适应优化 #梯度下降 #鲁棒性

📄 English Summary

Adaptive Optimization via Momentum on Variance-Normalized Gradients

MVN-Grad (Momentum on Variance-Normalized Gradients) is an Adam-style optimizer that enhances stability and performance by integrating two complementary concepts: variance-based normalization and momentum applied after normalization. MVN-Grad scales each coordinate by an exponential moving average of gradient uncertainty and applies momentum to the resulting normalized gradients, effectively eliminating the cross-time coupling between stale momentum and a stochastic normalizer found in standard Adam-type updates. This decoupling is proven to yield strictly smaller one-step conditional update variance than momentum-then-normalize variance methods under standard noise assumptions. Additionally, MVN-Grad demonstrates robustness to outliers, maintaining uniformly bounded performance.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Adaptive Optimization via Momentum on Variance-Normalized Gradients

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误