在发言前进行总结的 ARACH:一种无需训练的推理时插件,通过全球注意力重新分配增强 LLMs
📄 中文摘要
ARACH(自适应上下文中心的注意力重新分配)是一种无需训练的推理时插件,旨在增强大型语言模型(LLMs)的性能。尽管LLMs已经取得了显著的成果,但进一步提升往往需要昂贵的训练成本。这促使了对后训练技术的关注,尤其是那些在推理时无需更新权重的训练自由方法。大多数训练自由的方法将模型视为黑箱,通过输入/输出级别的干预来改善输出,如提示设计和通过重复采样、重新排名/验证或搜索进行的测试时缩放。ARACH提供了一种插拔式机制,能够干预模型的内部计算,从而在推理阶段提升模型的表现。
📄 English Summary
Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation
ARACH (Attention Reallocation via an Adaptive Context Hub) is a training-free inference-time plug-in designed to enhance the performance of large language models (LLMs). While LLMs have achieved remarkable results, further improvements often require costly training. This has led to increased interest in post-training techniques, particularly training-free approaches that enhance models at inference time without updating weights. Most training-free methods treat the model as a black box and improve outputs through input/output-level interventions, such as prompt design and test-time scaling via repeated sampling, reranking/verification, or search. ARACH offers a plug-and-play mechanism to intervene in a model's internal computations, thereby enhancing performance during inference.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等