📄 中文摘要
大型语言模型(LLMs)中存在的幻觉,即生成看似合理但与事实不符内容的问题,极大阻碍了其在关键应用场景中的可靠部署。当前幻觉检测方法普遍面临不切实际的假设,例如依赖高成本的密集采样策略进行一致性检查,或需要访问LLM的内部白盒状态。这些要求在实际应用中往往难以满足或效率低下。本文提出了一种名为“最低跨度置信度”(Lowest Span Confidence, LSC)的零样本幻觉检测指标,旨在克服现有方法的局限性。LSC通过评估LLM生成文本中置信度最低的语义跨度(span)来判断潜在的幻觉。该方法无需对LLM进行微调或访问其内部参数,使其成为一种完全黑盒的检测方案。
📄 English Summary
Lowest Span Confidence: A Zero-Shot Metric for Efficient and Black-Box Hallucination Detection in LLMs
Hallucinations in Large Language Models (LLMs), characterized by the generation of plausible but non-factual content, present a significant hurdle for their reliable deployment in high-stakes environments. Existing hallucination detection methodologies often rely on unrealistic assumptions, such as demanding expensive intensive sampling strategies for consistency checks or requiring access to white-box LLM states, which are frequently unavailable or inefficient in practical settings. This work introduces Lowest Span Confidence (LSC), a zero-shot hallucination detection metric designed to overcome these limitations. LSC assesses the potential for hallucination by evaluating the semantic spans within LLM-generated text that exhibit the lowest confidence scores. This approach operates entirely in a black-box manner, requiring no fine-tuning of the LLM or access to its internal parameters. The fundamental premise of LSC is that hallucinatory content often manifests with localized low generation confidence, even if the overall text appears coherent. By identifying and quantifying these least confident semantic units, LSC effectively captures the model's uncertainty and potential inaccuracies. The metric can be directly applied to any pre-trained LLM without the need for additional labeled data or computational resources. LSC's key advantages lie in its efficiency and universality, positioning it as a practical tool for evaluating and mitigating LLM hallucinations across various applications, especially in contexts demanding high factual accuracy.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等