我的代理决策中有 87.4% 运行在 0.8B 模型上
📄 中文摘要
个人 AI 代理 mini-agent 的推理调用中,87.4% 运行在一个 0.8B 参数模型上,该模型在生产环境中连续运行了 18 天。mini-agent 是一个感知驱动的系统,负责监控开发环境、管理任务和协助项目。其核心是 Claude(Opus/Sonnet),尽管功能强大,但每次调用都需要消耗代币和时间。因此,构建了一个级联层:首先由本地的 0.8B 模型(Qwen2.5)处理决策,只有在无法处理或任务需要深入推理时,才会将请求升级到 9B 模型,再到 Claude。
📄 English Summary
87.4% of My Agent's Decisions Run on a 0.8B Model
87.4% of the inference calls of a personal AI agent, mini-agent, run on a 0.8B parameter model, which has been in production for 18 consecutive days. The mini-agent is a perception-driven system that monitors the development environment, manages tasks, and assists with projects. Its core is Claude (Opus/Sonnet), which, while powerful, incurs token and time costs for each call. To optimize this, a cascade layer was built: a local 0.8B model (Qwen2.5) handles decisions first, and only escalates to a 9B model or Claude when deeper reasoning is required.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等