大语言模型智能体中推理在长程规划中失效的规划中心分析

📄 中文摘要

大语言模型(LLM)智能体在短时间尺度内展现出强大的分步推理能力,但难以在长规划周期中保持连贯行为。这种失败反映了一个根本性的不匹配:逐步推理导致了一种逐步贪婪策略,该策略对短周期有效,但在长程规划中却失效,因为早期的行动必须考虑延迟的后果。从规划的角度看,大语言模型智能体在长程决策制定中面临的核心挑战在于其内在的逐步推理机制与复杂规划问题所需的全局优化之间的差异。LLM智能体倾向于在每一步选择局部最优解,而忽略了这些选择对未来状态和最终目标的影响。这种短视行为在需要多步协调和权衡取舍的任务中尤为明显。

📄 English Summary

Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents

Large language model (LLM)-based agents demonstrate strong step-by-step reasoning over short horizons, yet frequently fail to maintain coherent behavior across extended planning horizons. This failure stems from a fundamental mismatch: the inherent step-wise reasoning paradigm induces a form of step-wise greedy policy that is adequate for short-term tasks but proves insufficient for long-horizon planning, where initial actions must proactively account for delayed consequences. From a planning-centric perspective, the core challenge for LLM agents in long-range decision-making lies in the discrepancy between their intrinsic sequential reasoning mechanism and the global optimization required for complex planning problems. LLM agents tend to select locally optimal solutions at each step, often neglecting the impact of these choices on future states and the ultimate goal. This myopic behavior is particularly evident in tasks demanding multi-step coordination and trade-offs. For instance, in tasks requiring a specific sequence of operations to achieve a final objective, planning can fail if early actions do not establish the necessary groundwork for subsequent steps, even if each local choice appears reasonable. The agents' lack of foresight into future states and their limited capacity to model long-chain causal relationships hinder their ability to effectively evaluate the long-term value of different action sequences. Furthermore, LLM agents are often susceptible to 'model hallucinations,' generating action sequences that appear plausible but are actually inconsistent with environmental constraints or logical coherence, further exacerbating planning failures.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等