📄 中文摘要
基于离散扩散的语言模型因其潜在的快速生成能力而受到广泛关注。然而,在实际应用中,这些模型在少步生成时样本质量急剧下降,未能实现其承诺。研究表明,利用基于流的连续去噪的语言模型在质量和速度上均优于离散扩散模型。通过重新审视离散模态下流的基本原理,构建了一个流基语言模型(FLM),该模型对独热编码进行欧几里得去噪。模型通过预测干净数据并采用交叉熵目标进行训练,同时引入了一种简单的时间重新参数化方法。
📄 English Summary
One-step Language Modeling via Continuous Denoising
Language models based on discrete diffusion have garnered significant attention due to their potential for faster generation compared to autoregressive models. However, they often suffer from a sharp decline in sample quality in the few-step regime, failing to deliver on this promise. This research demonstrates that language models utilizing flow-based continuous denoising can surpass discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, a flow-based language model (FLM) is constructed that performs Euclidean denoising on one-hot token encodings. The model can be trained by predicting clean data using a cross-entropy objective, incorporating a simple time reparameterization.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等