层间的真相：利用层内局部信息评分进行大型语言模型的不确定性估计

出处: Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

发布: 2026年3月25日

📄 中文摘要

大型语言模型（LLMs）常常在自信的情况下出现错误，因此可靠的不确定性估计（UE）显得尤为重要。基于输出的启发式方法虽然便宜，但容易脆弱；而探测内部表示则有效，但维度高且难以迁移。提出了一种紧凑的每实例不确定性估计方法，该方法通过单次前向传播评分内部表示中的跨层一致性模式。在三个模型上，该方法在同分布下的表现与探测相当，平均对角差异最多为-1.8 AUPRC百分点和+4.9 Brier分。在跨数据集迁移中，该方法始终优于探测，获得了高达+2.86 AUPRC和+21.02 Brier分的非对角增益。在4位权重量化下，该方法依然有效。

🏷️ 相关标签

#大型语言模型 #不确定性估计 #内部表示 #跨层一致性 #量化

📄 English Summary

Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

Large language models (LLMs) are often confidently wrong, making reliable uncertainty estimation (UE) essential. Output-based heuristics are inexpensive but fragile, while probing internal representations is effective yet high-dimensional and difficult to transfer. A compact, per-instance UE method is proposed that scores cross-layer agreement patterns in internal representations using a single forward pass. Across three models, this method matches probing in-distribution, with mean diagonal differences of at most -1.8 AUPRC percentage points and +4.9 Brier score points. Under cross-dataset transfer, it consistently outperforms probing, achieving off-diagonal gains of up to +2.86 AUPRC and +21.02 Brier points. Under 4-bit weight-only quantization, it remains effective.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误