LLM 路由基准测试:46 个模型,8 个提供商,亚毫秒路由

📄 中文摘要

在处理来自 8 个提供商的 46 个模型的 AI 请求时,不能仅依赖于成本或速度的单一标准。通过对平台上每个模型的基准测试,发现速度与智能之间的相关性较差。构建的生产路由系统能够在不到 1 毫秒的时间内,通过 14 个加权维度和 sigmoid 信心校准对请求进行分类。BlockRun 作为一个 x402 微支付网关,所有 LLM 请求都通过代理进行身份验证,并通过链上 USDC 支付转发到适当的提供商,支付开销为每个请求增加了 50-100 毫秒的延迟。

📄 English Summary

LLM Router Benchmark: 46 Models, 8 Providers, Sub-1ms Routing

Routing AI requests across 46 models from 8 providers requires more than just selecting the cheapest or fastest option. Benchmarking each model revealed a poor correlation between speed and intelligence. A production routing system was developed that classifies requests in under 1ms using 14 weighted dimensions and sigmoid confidence calibration. BlockRun serves as an x402 micropayment gateway, where every LLM request is authenticated via on-chain USDC payment and forwarded to the appropriate provider, adding a payment overhead of 50-100ms to each request.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等