📄 中文摘要
ACAR(自适应复杂性和归因路由)是一种测量框架,用于在可审计条件下研究多模型编排。该框架利用从N=3个探测样本计算的自一致性方差(sigma)来在单模型、双模型和三模型执行模式之间路由任务。系统基于TEAMLLM实现,具有不可变的工件和完整的决策痕迹。ACAR在MathArena、Reasoning Gym、LiveCodeBench和SuperGPQA四个基准上评估了1,510个任务,使用Claude Sonnet 4、GPT-4o和Gemini 2.0 Flash,产生了超过7,550次可审计的运行。结果表明,基于sigma的路由实现了55.6%的准确率,超过了双模型基准。
📄 English Summary
ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces
ACAR (Adaptive Complexity and Attribution Routing) is a measurement framework designed for studying multi-model orchestration under auditable conditions. It employs self-consistency variance (sigma) calculated from N=3 probe samples to route tasks across single-model, two-model, and three-model execution modes. The system is built on TEAMLLM, a deterministic execution substrate featuring immutable artifacts and complete decision traces. ACAR was evaluated on 1,510 tasks across four benchmarks: MathArena, Reasoning Gym, LiveCodeBench, and SuperGPQA, utilizing Claude Sonnet 4, GPT-4o, and Gemini 2.0 Flash, resulting in over 7,550 auditable runs. The findings indicate that sigma-based routing achieves an accuracy of 55.6%, surpassing the two-model baseline.