将下一词元预测建模为左嵌套直觉蕴涵

📄 中文摘要

Arrow语言模型是一种基于直觉逻辑解释下一词元预测的神经网络架构。该模型放弃了传统上将词元表示为通过注意力机制混合的加性嵌入方法,转而将前缀编码为一个“左嵌套蕴涵链”,其结构通过非交换组合保留了词元顺序。在这种框架下,下一词元预测被重新概念化为“肯定前件”(modus ponens)操作。序列处理因此转变为一个构造性过程,其核心在于利用蕴涵关系来推导下一个词元。Arrow语言模型的核心思想是将每个词元视为一个逻辑命题,而整个序列则被看作一系列相互关联的蕴涵式。左嵌套结构意味着每个后续词元都以前面所有词元为前提,形成一个层层递进的逻辑推导链。

📄 English Summary

Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication

The Arrow Language Model is a novel neural architecture derived from an intuitionistic-logic interpretation of next-token prediction. Diverging from conventional approaches that represent tokens as additive embeddings mixed by attention mechanisms, this model encodes a prefix as a "left-nested implication chain." This unique structure is designed to preserve token order through non-commutative composition, offering a fundamentally different way to represent sequential information. Within this framework, next-token prediction is reinterpreted as a process of "modus ponens" (affirmative premise), where the subsequent token is logically deduced from the preceding implication chain. Consequently, sequence processing transforms into a constructive procedure, focusing on utilizing these implicational relationships to derive the next token. The core idea of the Arrow Language Model is to treat each token as a logical proposition, with the entire sequence forming a series of interconnected implications. The left-nested structure implies that each subsequent token is predicated on all preceding tokens, establishing a layered logical deduction chain. This non-commutative composition ensures that the positional and contextual information of tokens within the sequence is precisely encoded and retained, unlike the potentially unordered interactions in attention mechanisms. By reframing next-token prediction as a problem of logical inference, the Arrow Language Model aims to provide a more interpretable and constructively sound method for sequence modeling. This approach is anticipated to demonstrate distinct advantages in tasks requiring strict logical reasoning and sequential dependency, such as code generation, mathematical theorem proving, or formal language processing.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等