停止向您的 LLM 代理发送 93K 的模式信息!

📄 中文摘要

在处理复杂的多表查询时,代理需要反复查询信息架构,以确定数据库中存在的表、列及其连接方式。在一个包含 500 个表的数据库中,完整的 DDL 约为 93,000 个标记。大多数问题涉及 3-5 个表。通过提前提供模式信息,测量到的标记减少率达到 64%。为了解决这个问题,开发了 dbdense 工具,它通过提取、编译和服务三个步骤,优化了数据库模式的处理流程。

📄 English Summary

Stop Sending 93K Tokens of Schema to Your LLM Agent!

Agents often query the information schema multiple times to determine the existing tables, their columns, and how they join, leading to excessive token usage. In a 500-table database, the full DDL amounts to around 93,000 tokens, while most queries involve 3-5 tables. By providing the schema upfront, a 64% reduction in tokens was measured during complex multi-table joins. To address this inefficiency, the dbdense tool was developed, which operates through a three-step offline pipeline: extract, compile, and serve.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等