上下文结构重塑语言模型表示几何

📄 中文摘要

大型语言模型(LLMs)在深层中将输入序列的表示组织成更直的神经轨迹,这一现象被推测有助于通过线性外推法进行下一词预测。语言模型还能够适应各种任务并在上下文中学习新结构,近期研究表明这种情境学习(ICL)可以反映在表示变化中。本研究旨在深入探讨上下文结构如何影响语言模型的表示几何。通过分析不同上下文设置下模型内部表征的变化,发现上下文的组织方式,例如信息呈现的顺序、相关信息的聚类或对比,能够显著改变神经轨迹的线性和分离度。具体而言,当上下文提供明确的结构化指导时,模型倾向于形成更紧凑、更易于区分的表示簇,从而可能提高模型对新输入进行泛化的能力。

📄 English Summary

Context Structure Reshapes the Representational Geometry of Language Models

Large Language Models (LLMs) organize input sequence representations into straighter neural trajectories in their deep layers, a phenomenon hypothesized to facilitate next-token prediction via linear extrapolation. Language models also adapt to diverse tasks and learn new structure in context, with recent work showing that this in-context learning (ICL) can be reflected in representational changes. This research aims to deeply explore how context structure influences the representational geometry of language models. By analyzing changes in the model's internal representations under various contextual settings, it is found that the organization of context, such as the order of information presentation, the clustering or contrast of related information, can significantly alter the linearity and separability of neural trajectories. Specifically, when context provides clear structured guidance, models tend to form more compact and distinguishable representation clusters, potentially enhancing the model's ability to generalize to new inputs. Furthermore, certain types of context structures are found to effectively reduce the dimensionality of the representation space, allowing the model to maintain higher computational efficiency when processing complex information. These findings not only offer new perspectives for understanding the internal working mechanisms of LLMs but also provide theoretical foundations for designing more efficient and robust in-context learning strategies. Through quantitative analysis of changes in representational geometry, prompt engineering techniques can be further optimized to better guide model learning and reasoning.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等