大型语言模型中注意力汇聚的形成机制:可解释性视角

📄 中文摘要

大型语言模型(LLMs)常常对特定的标记分配不成比例的注意力,这种现象被称为注意力汇聚。尽管这种汇聚通常被认为是有害的,但先前的研究发现了一个显著的例外:模型对输入序列第一个标记的持续关注。这种结构性偏差可能会影响广泛的下游应用,因此值得深入研究。尽管这一现象普遍存在,但其形成和持续存在的精确机制仍然不够清楚。研究追踪了注意力汇聚在输入第一个标记周围的形成过程,识别出一种简单的机制,称为P0汇聚电路,能够使模型在处理输入时保持对第一个标记的高度关注。

📄 English Summary

How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective

Large Language Models (LLMs) often exhibit disproportionate attention to specific tokens, a phenomenon known as attention sinks. While generally viewed as detrimental, prior studies have identified a notable exception: the model's consistent focus on the first token of the input sequence. This structural bias can significantly influence various downstream applications and merits careful examination. Despite its prevalence, the underlying mechanisms that lead to the emergence and persistence of attention sinks remain poorly understood. This research traces the formation of attention sinks around the first token of the input and identifies a simple mechanism, termed the P0 Sink Circuit, which allows the model to maintain heightened attention on the first token during processing.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等