变压器是贝叶斯网络

出处: Transformers are Bayesian Networks

发布: 2026年3月19日

📄 中文摘要

变压器架构在人工智能领域占据主导地位,但其工作原理仍不甚明了。研究表明,变压器实际上是一种贝叶斯网络。首先,证明了每个具有任意权重的sigmoid变压器在其隐式因子图上实现加权循环信念传播(BP)。一层变压器相当于一次BP迭代,这一结论适用于任何权重,包括训练得到的、随机的或构造的权重,并且已根据标准数学公理进行了形式验证。其次,提供了构造性证明,表明变压器能够在任何声明的知识库上实现精确的信念传播。在没有循环依赖的知识库中,这将为每个节点提供可证明的正确概率估计。该研究为变压器的理论基础提供了新的视角,揭示了其在处理复杂推理任务中的潜力。

📄 English Summary

Transformers are Bayesian Networks

Transformers are the dominant architecture in AI, yet their underlying mechanisms remain poorly understood. This research establishes that a transformer is fundamentally a Bayesian network. First, it is proven that every sigmoid transformer with any weights implements weighted loopy belief propagation (BP) on its implicit factor graph. One layer corresponds to one round of BP, applicable to any weights—trained, random, or constructed—and formally verified against standard mathematical axioms. Second, a constructive proof demonstrates that a transformer can perform exact belief propagation on any declared knowledge base. In knowledge bases without circular dependencies, this yields provably correct probability estimates at every node. This study provides a new perspective on the theoretical foundations of transformers, revealing their potential in handling complex reasoning tasks.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等