📄 中文摘要
研究重新解释了最终的大型语言模型(LLM)软最大分类器为能量基础模型(EBM),将序列到序列的概率链分解为多个相互作用的EBM。该方法在推理过程中能够追踪解码过程中的“能量溢出”,并通过实验证明其与事实错误、偏见和失败相关联。与Orgad等人(2025)的研究类似,该方法能够定位确切的答案标记,并随后测试幻觉现象。然而,关键在于不需要经过训练的探测分类器或激活消融,而是直接从输出对数中引入了两个完全无训练的指标:能量溢出,捕捉能量值之间的差异。
📄 English Summary
Spilled Energy in Large Language Models
This study reinterprets the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs during inference. This principled approach allows for tracking 'energy spills' during decoding, which are empirically shown to correlate with factual errors, biases, and failures. Similar to the work of Orgad et al. (2025), the method localizes the exact answer token and subsequently tests for hallucinations. Crucially, this is achieved without requiring trained probe classifiers or activation ablations. Instead, two completely training-free metrics are introduced, derived directly from output logits: spilled energy, which captures the discrepancy between energy values across the decoding process.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等