摩擦的旋转位置嵌入与长输入：几何视角

出处: Frayed RoPE and Long Inputs: A Geometric Perspective

发布: 2026年3月20日

📄 中文摘要

旋转位置嵌入（RoPE）是一种在语言模型中广泛采用的位置编码技术，尽管其有效性显著，但在输入长度超过训练长度时会导致性能下降。已有分析指出，长输入会导致通道旋转“超出分布”，但额外的旋转如何与病态行为相关或导致病态行为尚不明确。通过实证和理论分析，提供了对RoPE下注意力行为的统一几何理解。研究发现，注意力机制导致分离的键和值潜在点云的紧密聚类，从而创建了“沉没令牌”：这些占位符使得注意力头在不需要时避免令牌混合。将RoPE应用于更长输入时，表现出特定的几何特征，影响了模型的性能。

🏷️ 相关标签

#旋转位置嵌入 #长输入 #注意力机制 #几何理解 #模型性能

📄 English Summary

Frayed RoPE and Long Inputs: A Geometric Perspective

Rotary Positional Embedding (RoPE) is a widely adopted technique for encoding positions in language models, which, while effective, experiences performance breakdown when input lengths exceed training lengths. Prior analyses have rightly asserted that long inputs cause channels to rotate 'out of distribution,' but the relationship between extra rotation and pathological behavior remains unclear. Through empirical and theoretical analysis, a unified geometric understanding of attention behavior with RoPE is advanced. The study finds that attention induces tight clustering of separated key and query latent point clouds, allowing for the creation of sink tokens: placeholders that enable attention heads to avoid token mixing when not required. The application of RoPE to longer inputs reveals specific geometric characteristics that impact model performance.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Frayed RoPE and Long Inputs: A Geometric Perspective

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误