📄 中文摘要
多智能体系统通过HTTP进行协调,将上下文序列化为JSON,作为请求体发送,接收端解析后再反馈给大型语言模型(LLM)。每条消息都携带HTTP头、内容类型协商和认证令牌。由于HTTP是无状态的,每个协调步骤都重新传输整个交互的累积上下文。测量结果显示,这种协调开销的令牌消耗比实际任务内容高出多达15倍。在一个三代理的文档处理管道中,代理A将文档和指令发送给代理B(5K令牌),代理B将所有内容及其分析发送给代理C(12K令牌),代理C再将完整链条及其贡献返回(20K令牌)。文档本身为3K令牌,协调开销达34K令牌。
📄 English Summary
The 15x Token Tax on Multi-Agent Coordination
Multi-agent systems coordinate over HTTP by serializing context into JSON, sending it as request bodies, parsing it on the receiving end, and feeding it back into a large language model (LLM). Each message carries HTTP headers, content-type negotiations, and authentication tokens. Due to the stateless nature of HTTP, every coordination step retransmits the accumulated context of the entire interaction. Measurements indicate that this coordination overhead can cost up to 15 times more tokens than the actual task content. In a three-agent pipeline processing a document, Agent A sends the document and instructions to Agent B (5K tokens), Agent B sends everything along with its analysis to Agent C (12K tokens), and Agent C returns the full chain with its contribution (20K tokens). The document itself was 3K tokens, resulting in a coordination overhead of 34K tokens.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等