📄 中文摘要
大型语言模型(LLMs)正日益被定位为招聘、医疗和经济判断等领域的决策引擎,然而,真实世界中的人类判断是理性审慎与情感驱动偏见之间的一种平衡。如果LLMs要参与高风险决策或作为人类行为模型,评估它们是否表现出类似的(非)理性和偏见模式至关重要。为此,本研究评估了多种大型语言模型在模拟人类判断和选择方面的表现。研究设计了一系列实验,包括认知偏见任务和基于情境的决策场景,旨在探究LLMs在面对与人类相似的认知挑战时,能否展现出与人类判断模式相符的理性或非理性行为。具体而言,实验涵盖了锚定效应、框架效应、确认偏误等经典认知偏见,以及在不确定性、风险和道德困境下的决策情境。
📄 English Summary
Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?
Large Language Models (LLMs) are increasingly positioned as decision engines for hiring, healthcare, and economic judgment, yet real-world human judgment reflects a balance between rational deliberation and emotion-driven bias. If LLMs are to participate in high-stakes decisions or serve as models of human behavior, it is critical to assess whether they exhibit analogous patterns of (ir)rationalities and biases. To this end, this study evaluates multiple LLMs' performance in simulating human judgment and choice. A series of experiments were designed, including cognitive bias tasks and scenario-based decision-making situations, to investigate whether LLMs can demonstrate rational or irrational behaviors consistent with human judgment patterns when facing similar cognitive challenges. Specifically, the experiments covered classic cognitive biases such as anchoring effect, framing effect, and confirmation bias, as well as decision-making scenarios under uncertainty, risk, and ethical dilemmas. By comparing the outputs of LLMs with human performance on the same tasks, the study aims to quantify the degree of deviation or consistency of LLMs in their decision-making processes. The results indicate that while some LLMs exhibit high logical reasoning capabilities in certain tasks, they often reproduce or even amplify cognitive biases inherent in humans. For instance, in some framing effect tasks, LLMs show similar sensitivity to information presentation as humans, but in other more complex ethical dilemmas, their decision logic may significantly differ from mainstream human judgment.