第一个标记的信心有多强?一种不确定性校准的提示优化框架用于大型语言模型分类和理解

📄 中文摘要

随着大型语言模型(LLMs)在自然语言处理中的广泛应用,提示工程和检索增强生成(RAG)已成为提升LLMs在复杂任务中表现的主流方法。然而,LLMs以自回归方式生成输出,导致不可避免的输出不确定性。模型性能对提示设计高度敏感,因此精确的不确定性测量对于可靠的提示优化至关重要。在多类多选(理解)任务中,传统的不确定性度量(如熵)基于输出概率,平等对待所有类别,忽略了预训练语料库中类别先验的差异。这种未能区分来自先验的虚假信心与真实确定性的做法,限制了模型在复杂任务中的表现。

📄 English Summary

How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding

The widespread adoption of large language models (LLMs) in natural language processing has made prompt engineering and retrieval-augmented generation (RAG) mainstream techniques for enhancing LLM performance on complex tasks. However, LLMs generate outputs autoregressively, leading to inevitable output uncertainty. Model performance is highly sensitive to prompt design, making precise uncertainty measurement crucial for reliable prompt optimization. In multi-class multiple-choice (understanding) tasks, conventional uncertainty measures (e.g., entropy) based on output probabilities treat all classes equally and ignore differences in class priors from pretraining corpora. This failure to distinguish spurious confidence (from priors) from true certainty limits the model's performance on complex tasks.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等