从直觉到专业:基于量规的认知校准提升人工识别LLM生成韩语文本的能力

📄 中文摘要

区分人类撰写的韩语文本与流畅的LLM(大型语言模型)生成文本,即使对于受过语言训练的读者来说,仍然是一项艰巨的任务,因为他们往往过度信任文本的表面流畅性。探究专家级检测是否可以作为一种可学习的技能,并通过结构化校准得到提升。引入LREAD,这是一个从国家韩语写作标准中提取的量规,并进行调整以针对微观层面的生成痕迹,例如标点符号的可选性、空格使用习惯、助词的选择性应用、不自然的句子结构以及语义不连贯等。通过设计实验,让受试者在LREAD量规的指导下,系统性地评估韩语文本。实验结果表明,经过量规校准的评估者在识别LLM生成文本方面的准确率显著提升,并且能够更清晰地阐明其判断依据,而非仅仅依赖直觉。

📄 English Summary

From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text

Distinguishing human-written Korean text from fluent large language model (LLM) outputs remains challenging, even for linguistically trained readers who often over-rely on surface well-formedness. This research investigates whether expert-level detection can be treated as a learnable skill and enhanced through structured cognitive calibration. We introduce LREAD, a rubric derived from national Korean writing standards and specifically adapted to identify micro-level artifacts characteristic of LLM generation. These artifacts include, but are not limited to, optionality in punctuation, idiosyncratic spacing behaviors, selective particle application, unnatural sentence structures, and subtle semantic incoherence. Through a series of experimental trials, participants were guided to systematically evaluate Korean texts using the LREAD rubric. Experimental results demonstrate a significant improvement in the accuracy of identifying LLM-generated texts among evaluators who underwent rubric-based calibration. Furthermore, these calibrated evaluators were able to articulate clearer justifications for their judgments, moving beyond mere intuitive assessments. The LREAD rubric provides a standardized framework that redirects evaluators' attention from overall text fluency to deeper linguistic features and specific patterns indicative of LLM generation. This approach not only boosts detection accuracy but also enhances the confidence and explainability of evaluators' judgments. The study further analyzes the performance differences among evaluators of varying linguistic backgrounds and expertise levels after applying the LREAD rubric, revealing that systematic calibration training effectively improves discernment capabilities regardless of initial proficiency. This work offers crucial theoretical and practical foundations for developing more effective LLM-generated text detection tools and training methodologies in the future.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等