在基于注意力的模型中学习记忆、学习和遗忘

📄 中文摘要

研究提出了一种名为Palimpsa的自注意力模型,将上下文学习视为一个持续学习问题,旨在解决稳定性与可塑性之间的矛盾。在变门线性注意力模型中,上下文学习作为一种在线关联记忆,具有固定的容量,容易受到干扰,尤其是在处理长序列时。Palimpsa采用贝叶斯元可塑性,将每个注意力状态的可塑性与一个基于先验分布的权重状态相联系,该先验分布捕捉了累积的知识。研究表明,各种变门线性注意力模型作为特定的架构选择和后验近似而出现。

📄 English Summary

Learning to Remember, Learn, and Forget in Attention-Based Models

The study introduces Palimpsa, a self-attention model that conceptualizes In-Context Learning (ICL) as a continual learning problem, addressing the stability-plasticity dilemma. In gated linear attention models, ICL functions as an online associative memory with a fixed capacity, which is susceptible to interference, particularly in long sequences. Palimpsa employs Bayesian metaplasticity, linking the plasticity of each attention state to an importance state grounded in a prior distribution that encapsulates accumulated knowledge. The findings demonstrate that various gated linear attention models emerge as specific architectural choices and posterior approximations.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等