为何大型语言模型会虚构不存在的学术引用以及如何阻止这种现象

📄 中文摘要

虚构引用是指引用不存在的文献,这种现象与引用不准确(引用真实文献但不支持的论点)和“僵尸”文献(低质量但真实的文献)有所不同。人类长期以来通过打字错误、抄袭或论文工厂产生虚假引用,而大型语言模型(LLMs)则改变了这一现象的规模:一个提示可以生成数十个看似合理但实际上并不存在的引用,这些引用以极小的摩擦流入论文、学位论文和报告中。解决这一问题需要对LLMs的引用生成机制进行深入分析和改进,以确保学术诚信。

📄 English Summary

Why Llms Invent Academic Citations That Don T Exist And How To Stop Them

Ghost references refer to citations of non-existent works, distinct from citation unfaithfulness (citing real papers for unsupported claims) and 'zombie' papers (low-quality but real). Humans have long produced bogus references through typos, copying, or paper mills. However, LLMs change the scale of this issue: a single prompt can yield dozens of plausible but non-existent citations that seamlessly integrate into papers, theses, and reports. Addressing this problem requires a thorough analysis and improvement of the citation generation mechanisms of LLMs to ensure academic integrity.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等