在基于注意力的模型中学习记忆、学习和遗忘

出处: Learning to Remember, Learn, and Forget in Attention-Based Models

发布: 2026年2月11日

📄 中文摘要

研究提出了一种名为Palimpsa的自注意力模型，将上下文学习视为一个持续学习问题，旨在解决稳定性与可塑性之间的矛盾。在变门线性注意力模型中，上下文学习作为一种在线关联记忆，具有固定的容量，容易受到干扰，尤其是在处理长序列时。Palimpsa采用贝叶斯元可塑性，将每个注意力状态的可塑性与一个基于先验分布的权重状态相联系，该先验分布捕捉了累积的知识。研究表明，各种变门线性注意力模型作为特定的架构选择和后验近似而出现。

🏷️ 相关标签

#上下文学习 #自注意力模型 #持续学习 #贝叶斯元可塑性

📄 English Summary

Learning to Remember, Learn, and Forget in Attention-Based Models

The study introduces Palimpsa, a self-attention model that conceptualizes In-Context Learning (ICL) as a continual learning problem, addressing the stability-plasticity dilemma. In gated linear attention models, ICL functions as an online associative memory with a fixed capacity, which is susceptible to interference, particularly in long sequences. Palimpsa employs Bayesian metaplasticity, linking the plasticity of each attention state to an importance state grounded in a prior distribution that encapsulates accumulated knowledge. The findings demonstrate that various gated linear attention models emerge as specific architectural choices and posterior approximations.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Learning to Remember, Learn, and Forget in Attention-Based Models

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误