Attn-QAT: 4位量化感知训练的注意力机制

出处: Attn-QAT: 4-Bit Attention With Quantization-Aware Training

发布: 2026年3月3日

📄 中文摘要

实现可靠的4位注意力是新兴FP4计算能力GPU上端到端FP4计算的前提，但由于FP4的动态范围极小以及注意力机制的重尾激活，注意力仍然是主要障碍。该研究首次系统性地研究了针对注意力的4位量化感知训练（QAT）。发现简单地将FP4前向传递与高精度Flash Attention（FA）风格的反向传递结合的“直接插入”QAT会导致训练不稳定。识别出稳定FP4注意力的两个关键原则：1）在反向传递中匹配低精度的注意力分数重计算；2）解决FA梯度计算中的隐式精度假设。基于这些见解，提出了相应的解决方案。

🏷️ 相关标签

#量化感知训练 #4位注意力 #动态范围 #重尾激活

📄 English Summary

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

Achieving reliable 4-bit attention is essential for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains a significant obstacle due to FP4's limited dynamic range and the heavy-tailed activations associated with attention. This research presents the first systematic study of 4-bit quantization-aware training (QAT) for attention mechanisms. It was found that a naive 'drop-in' QAT approach, which combines an FP4 forward pass with a high-precision Flash Attention (FA)-style backward pass, leads to training instability. Two key principles for stable FP4 attention were identified: (1) matching low-precision recomputation of attention scores in the backward pass, and (2) addressing implicit precision assumptions in FA's gradient calculation. Based on these insights, corresponding solutions are proposed.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误