InfoMamba:一种无注意力的混合Mamba-Transformer模型

📄 中文摘要

在序列建模中,平衡细粒度的局部建模与长距离依赖捕获仍然是一个核心挑战。虽然Transformer在令牌混合方面表现出色,但其计算复杂度为二次,而Mamba风格的选择性状态空间模型(SSMs)则以线性方式扩展,但往往难以捕捉高阶和同步的全局交互。通过一致性边界分析,确定了对角短记忆SSMs何时能够近似因果注意力,并识别出仍然存在的结构性差距。基于这一分析,提出了InfoMamba,一种无注意力的混合架构。InfoMamba用概念瓶颈线性滤波层替代了令牌级自注意力,从而提高了模型的效率与性能。

📄 English Summary

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

The challenge of balancing fine-grained local modeling with long-range dependency capture under computational constraints is a central issue in sequence modeling. While Transformers excel in token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. A consistency boundary analysis is presented that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies existing structural gaps. Motivated by this analysis, InfoMamba is proposed as an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer, enhancing the model's efficiency and performance.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等