层间的真相:利用层内局部信息评分进行大型语言模型的不确定性估计

📄 中文摘要

大型语言模型(LLMs)常常在自信的情况下出现错误,因此可靠的不确定性估计(UE)显得尤为重要。基于输出的启发式方法虽然便宜,但容易脆弱;而探测内部表示则有效,但维度高且难以迁移。提出了一种紧凑的每实例不确定性估计方法,该方法通过单次前向传播评分内部表示中的跨层一致性模式。在三个模型上,该方法在同分布下的表现与探测相当,平均对角差异最多为-1.8 AUPRC百分点和+4.9 Brier分。在跨数据集迁移中,该方法始终优于探测,获得了高达+2.86 AUPRC和+21.02 Brier分的非对角增益。在4位权重量化下,该方法依然有效。

📄 English Summary

Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

Large language models (LLMs) are often confidently wrong, making reliable uncertainty estimation (UE) essential. Output-based heuristics are inexpensive but fragile, while probing internal representations is effective yet high-dimensional and difficult to transfer. A compact, per-instance UE method is proposed that scores cross-layer agreement patterns in internal representations using a single forward pass. Across three models, this method matches probing in-distribution, with mean diagonal differences of at most -1.8 AUPRC percentage points and +4.9 Brier score points. Under cross-dataset transfer, it consistently outperforms probing, achieving off-diagonal gains of up to +2.86 AUPRC and +21.02 Brier points. Under 4-bit weight-only quantization, it remains effective.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等