超越掩码:通过删除-插入过程实现高效灵活的扩散语言模型

📄 中文摘要

提出了一种新的删除-插入扩散语言模型(DID),该模型将令牌的删除和插入严格地形式化为离散扩散过程,从而替代当前掩码扩散语言模型(MDLMs)中的掩码和解掩码过程。DID通过消除MDLMs中两个主要的计算开销来源,显著提高了训练和推理效率:一是消除了与非信息性<MASK>令牌相关的计算,二是消除了在可变长度设置中引入的<PAD>令牌。此外,DID还提供了更大的生成灵活性,使其在语言建模任务中具有更广泛的应用潜力。

📄 English Summary

Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes

A novel Deletion-Insertion Diffusion language model (DID) is proposed, rigorously formulating token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current Masked Diffusion Language Models (MDLMs). DID significantly improves training and inference efficiency by eliminating two major sources of computational overhead in MDLMs: the computations associated with non-informative <MASK> tokens and the <PAD> tokens introduced in variable-length settings. Furthermore, DID offers greater generation flexibility, enhancing its applicability in various language modeling tasks.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等