教大型语言模型提问:自查询范畴论规划应对欠规范推理

📄 中文摘要

大型语言模型在推理时进行规划,在部分可观察性条件下常出现问题:当查询时未明确任务关键前提条件时,模型倾向于臆造缺失事实或生成违反硬约束的规划。自查询双向范畴规划(SQ-BCP)明确表示前提条件状态(满足/违反/未知),并通过提问来解决未知情况。SQ-BCP将规划任务表述为在状态空间中寻找从初始状态到目标状态的路径,其中每个状态由一组事实和前提条件状态表示。当遇到一个前提条件状态为“未知”的操作时,SQ-BCP会生成一个问题,通过外部信息源(如知识库查询、传感器数据或用户澄清)来获取答案。这种提问机制使得模型能够主动寻求缺失信息,而不是被动地进行臆造。

📄 English Summary

Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning

Inference-time planning with large language models frequently falters under partial observability: when critical preconditions for a task are not explicitly provided at query time, models often resort to hallucinating missing facts or generating plans that violate strict constraints. Self-Querying Bidirectional Categorical Planning (SQ-BCP) explicitly represents the status of preconditions as either Satisfied, Violated, or Unknown. It then actively resolves these unknown statuses by formulating queries to external information sources. SQ-BCP frames the planning problem as finding a path from an initial state to a goal state within a state space, where each state is characterized by a set of facts and precondition statuses. Upon encountering an operation whose precondition status is 'Unknown', SQ-BCP generates a targeted question to acquire the necessary information, leveraging external sources such as knowledge base lookups, sensor data, or user clarification. This proactive questioning mechanism enables the model to actively seek out missing information rather than passively fabricating it. Furthermore, SQ-BCP employs a category-theoretic mathematical framework to formalize the planning process, abstracting states as objects and operations as morphisms, which facilitates a more robust handling of complex relationships and dependencies. The bidirectional planning strategy allows the model to search simultaneously forward from the initial state and backward from the goal state, thereby enhancing efficiency in large search spaces. The model iteratively refines its plan by selecting optimal operations, updating precondition statuses, and generating queries as needed until all preconditions are met and the goal state is reached.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等