学习从键值缓存中驱逐

出处: Learning to Evict from Key-Value Cache

发布: 2026年2月12日

📄 中文摘要

随着大型语言模型(LLMs)规模的不断扩大,高效推理面临挑战,主要是由于自回归键值(KV)缓存的内存需求。现有的驱逐或压缩方法虽然能够降低成本,但依赖于诸如最近性或过去注意力分数等启发式方法,这些方法仅作为代币未来效用的间接代理,并且引入了计算开销。将KV缓存驱逐重新构建为强化学习(RL)问题,旨在通过预测未来解码的有用性来对代币进行排名。为此,提出了KV策略(KVP),这是一个轻量级的每头RL代理框架,利用仅包含键和值向量的预计算生成轨迹进行训练。每个代理学习一个专门的驱逐策略。

📄 English Summary

Learning to Evict from Key-Value Cache

The increasing size of Large Language Models (LLMs) poses challenges for efficient inference, primarily due to the memory requirements of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce costs but rely on heuristics such as recency or past attention scores, which serve as indirect proxies for a token's future utility and introduce computational overhead. This research reframes KV cache eviction as a reinforcement learning (RL) problem, focusing on learning to rank tokens based on their predicted usefulness for future decoding. To achieve this, the KV Policy (KVP) framework is introduced, consisting of lightweight per-head RL agents trained on pre-computed generation traces using only key and value vectors. Each agent learns a specialized eviction policy.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等