数据生成过程中的层次潜在结构统一了跨尺度的机制现象

📄 中文摘要

当代研究揭示了基于Transformer的语言模型在神经信息处理中的许多令人困惑的现象。为了建立对这些现象的稳健统一理解,需要在训练范围内对模型进行拆解。然而,预训练语料库的不可处理规模限制了自下而上的调查,而数据生成过程的简单假设则限制了表达能力,无法解释复杂模式。通过使用概率上下文无关文法(PCFGs),生成了忠实且计算高效的合成语料库,作为网络规模文本语料库的代理。研究考察了三种机制现象的出现:归纳头、函数向量等。

📄 English Summary

Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

Contemporary studies have revealed many puzzling phenomena in the neural information processing of Transformer-based language models. A robust and unified understanding of these phenomena necessitates disassembling a model within the scope of its training. However, the intractable scale of pretraining corpora limits bottom-up investigations, while simplistic assumptions of the data generation process restrict expressivity and fail to account for complex patterns. This work employs probabilistic context-free grammars (PCFGs) to generate synthetic corpora that serve as faithful and computationally efficient proxies for web-scale text corpora. The emergence of three mechanistic phenomena is investigated: induction heads, function vectors, among others.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等