数据生成过程中的层次潜在结构统一了跨尺度的机制现象

出处: Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

发布: 2026年3月10日

📄 中文摘要

当代研究揭示了基于Transformer的语言模型在神经信息处理中的许多令人困惑的现象。为了建立对这些现象的稳健统一理解，需要在训练范围内对模型进行拆解。然而，预训练语料库的不可处理规模限制了自下而上的调查，而数据生成过程的简单假设则限制了表达能力，无法解释复杂模式。通过使用概率上下文无关文法（PCFGs），生成了忠实且计算高效的合成语料库，作为网络规模文本语料库的代理。研究考察了三种机制现象的出现：归纳头、函数向量等。

🏷️ 相关标签

#层次潜在结构 #数据生成 #机制现象

📄 English Summary

Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

Contemporary studies have revealed many puzzling phenomena in the neural information processing of Transformer-based language models. A robust and unified understanding of these phenomena necessitates disassembling a model within the scope of its training. However, the intractable scale of pretraining corpora limits bottom-up investigations, while simplistic assumptions of the data generation process restrict expressivity and fail to account for complex patterns. This work employs probabilistic context-free grammars (PCFGs) to generate synthetic corpora that serve as faithful and computationally efficient proxies for web-scale text corpora. The emergence of three mechanistic phenomena is investigated: induction heads, function vectors, among others.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误