量化预算受限的代理 LLM 搜索中的设计决策的准确性和成本影响

📄 中文摘要

代理检索增强生成(RAG)系统结合了迭代搜索、规划提示和检索后端,但在实际部署中对工具调用和完成令牌施加了明确的预算限制。研究通过受控测量研究,分析了搜索深度、检索策略和完成预算在固定约束下对准确性和成本的影响。采用预算受限代理搜索(BCAS)这一模型无关的评估工具,揭示剩余预算并限制工具使用,比较了六种大语言模型(LLM)和三种问答基准。在不同模型和数据集上,随着额外搜索的增加,准确性有所提升,直到达到小幅上限,而混合的词汇和密集检索结合轻量级重排序则产生了最大的平均效果。

📄 English Summary

Quantifying the Accuracy and Cost Impact of Design Decisions in Budget-Constrained Agentic LLM Search

Agentic Retrieval-Augmented Generation (RAG) systems integrate iterative search, planning prompts, and retrieval backends, yet real-world deployments impose explicit budgets on tool calls and completion tokens. This study presents a controlled measurement analysis of how search depth, retrieval strategy, and completion budget impact accuracy and cost under fixed constraints. Utilizing Budget-Constrained Agentic Search (BCAS), a model-agnostic evaluation framework that reveals remaining budget and regulates tool usage, comparisons are made across six large language models (LLMs) and three question-answering benchmarks. Across various models and datasets, accuracy improves with additional searches up to a small cap, while a hybrid approach of lexical and dense retrieval with lightweight re-ranking yields the highest average performance.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等