第一个标记的信心有多强？一种不确定性校准的提示优化框架用于大型语言模型分类和理解

出处: How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding

发布: 2026年3月20日

📄 中文摘要

随着大型语言模型（LLMs）在自然语言处理中的广泛应用，提示工程和检索增强生成（RAG）已成为提升LLMs在复杂任务中表现的主流方法。然而，LLMs以自回归方式生成输出，导致不可避免的输出不确定性。模型性能对提示设计高度敏感，因此精确的不确定性测量对于可靠的提示优化至关重要。在多类多选（理解）任务中，传统的不确定性度量（如熵）基于输出概率，平等对待所有类别，忽略了预训练语料库中类别先验的差异。这种未能区分来自先验的虚假信心与真实确定性的做法，限制了模型在复杂任务中的表现。

🏷️ 相关标签

#大型语言模型 #提示优化 #不确定性测量 #多类任务 #自回归生成

📄 English Summary

How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding

The widespread adoption of large language models (LLMs) in natural language processing has made prompt engineering and retrieval-augmented generation (RAG) mainstream techniques for enhancing LLM performance on complex tasks. However, LLMs generate outputs autoregressively, leading to inevitable output uncertainty. Model performance is highly sensitive to prompt design, making precise uncertainty measurement crucial for reliable prompt optimization. In multi-class multiple-choice (understanding) tasks, conventional uncertainty measures (e.g., entropy) based on output probabilities treat all classes equally and ignore differences in class priors from pretraining corpora. This failure to distinguish spurious confidence (from priors) from true certainty limits the model's performance on complex tasks.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误