在大型语言模型中诊断多重上下文知识更新下的检索偏差

📄 中文摘要

大型语言模型(LLMs)在知识密集型任务中被广泛应用,其中同一事实可能在上下文中被多次修订。与以往关注单次更新或单一冲突的研究不同,多次更新场景包含多个历史有效版本,这些版本在检索时相互竞争,但仍然未得到充分探索。这一挑战类似于认知心理学中的AB-AC干扰范式:当同一线索A依次与B和C关联时,旧的和新的关联在检索过程中竞争,导致偏差。基于此,提出了一种动态知识实例(DKI)评估框架,将同一事实的多次更新建模为一个线索与一系列更新值的配对,并通过对最早版本的端点探测来评估模型的表现。

📄 English Summary

Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models

Large Language Models (LLMs) are extensively utilized in knowledge-intensive tasks where the same fact may be revised multiple times within context. Unlike previous studies that focus on one-shot updates or single conflicts, multi-update scenarios involve multiple historically valid versions that compete during retrieval, yet remain underexplored. This challenge resembles the AB-AC interference paradigm in cognitive psychology: when the same cue A is successively associated with B and C, the old and new associations compete during retrieval, leading to bias. Inspired by this, a Dynamic Knowledge Instance (DKI) evaluation framework is introduced, modeling multi-updates of the same fact as a cue paired with a sequence of updated values, and assessing models through endpoint probing of the earliest versions.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等