临床数据抽象的稳定性感知提示优化

出处: Stability-Aware Prompt Optimization for Clinical Data Abstraction

发布: 2026年2月3日

📄 中文摘要

大型语言模型在临床数据抽象任务中对提示词措辞高度敏感，但现有多数研究将提示词视为固定不变，并孤立地探讨不确定性。本工作主张应将提示词敏感性与不确定性联合考虑。通过在两个临床任务（MedAlign适用性/正确性和多发性硬化症亚型抽象）以及多个开源和专有模型上进行实验，量化了提示词敏感性，具体通过翻转率来衡量，并将其与校准和选择性预测关联起来。研究发现，高提示词敏感性通常与较差的校准性能以及通过选择性预测实现性能提升的潜力相关。基于这些观察，提出了一种稳定性感知提示优化方法，该方法通过生成和评估提示词的多个变体来识别出对模型输出影响最小的提示词。

🏷️ 相关标签

#大型语言模型 #临床数据抽象 #提示优化 #稳定性感知 #不确定性量化

📄 English Summary

Stability-Aware Prompt Optimization for Clinical Data Abstraction

Large language models, when applied to clinical data abstraction, exhibit significant sensitivity to prompt wording. However, most existing research treats prompts as static entities and investigates uncertainty in isolation. This work argues for a joint consideration of prompt sensitivity and uncertainty. Across two distinct clinical tasks—MedAlign applicability/correctness and multiple sclerosis subtype abstraction—and utilizing multiple open-source and proprietary models, prompt sensitivity is quantified through flip rates. This sensitivity is then correlated with model calibration and the potential for performance improvement via selective prediction. Findings indicate that high prompt sensitivity is frequently associated with poorer calibration performance and a greater potential for performance gains through selective prediction. Building upon these observations, a stability-aware prompt optimization methodology is proposed. This method involves generating and evaluating multiple prompt variants to identify those that minimally impact model outputs. Such an approach not only discovers more stable prompts, thereby enhancing model robustness across different prompt formulations, but also improves calibration by more accurately identifying model uncertainty. By optimizing prompts, the models are able to produce more consistent and reliable outputs when faced with similar yet differently worded instructions. Furthermore, this method contributes to a better understanding of model limitations in specific clinical tasks and offers more robust strategies for deploying language models in clinical applications. Experimental results demonstrate the significant effectiveness of this approach in improving the accuracy and reliability of clinical data abstraction tasks, providing a novel optimization pathway for the practical application of large language models in the medical domain.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Stability-Aware Prompt Optimization for Clinical Data Abstraction

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误