自我感知知识探测:通过置信度校准评估语言模型的关系知识

📄 中文摘要

知识探测旨在量化语言模型在预训练阶段习得的关系知识量。现有知识探测方法主要通过预测准确率和精确率等指标来评估模型能力。然而,这些评估未能充分考虑模型的可靠性,即其置信度得分的校准程度。为此,提出一种新颖的校准探测框架,用于评估语言模型的关系知识。该框架关注模型在预测其已知事实时的置信度与其实际正确率之间的一致性。具体而言,通过设计一系列探针任务,系统地诱导语言模型对特定关系三元组的知识进行预测,并记录其输出的置信度。随后,利用校准技术(如可靠性图和预期校准误差)来分析这些置信度与实际预测结果的匹配程度。

📄 English Summary

Self-Aware Knowledge Probing: Evaluating Language Models' Relational Knowledge through Confidence Calibration

Knowledge probing quantifies the relational knowledge acquired by language models (LMs) during pre-training. Existing knowledge probes primarily assess model capabilities using metrics such as prediction accuracy and precision. However, such evaluations overlook the model's reliability, which is reflected in the calibration of its confidence scores. A novel calibration probing framework is proposed for relational knowledge, focusing on the consistency between a model's predicted confidence for known facts and its actual correctness. Specifically, a series of probe tasks are designed to systematically elicit language models' predictions regarding specific relational triples and record their associated confidence scores. Subsequently, calibration techniques, such as reliability diagrams and expected calibration error, are employed to analyze the alignment between these confidence scores and the actual prediction outcomes. This framework extends the evaluation dimension of knowledge probing from mere performance metrics to the model's self-awareness capabilities, i.e., its perception of its own knowledge boundaries. By comparing the calibration performance of different models across various knowledge domains and relation types, a deeper understanding of the models' knowledge representation and reasoning mechanisms can be achieved. Furthermore, the framework supports a detailed analysis of model behavior at different confidence levels, revealing patterns in how models handle uncertainty. Calibration probing not only identifies the knowledge a model possesses but also uncovers the accuracy of its self-assessment regarding its knowledge mastery, offering a new evaluation paradigm for developing more reliable and trustworthy language models.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等