我的代理决策中有 87.4% 运行在 0.8B 模型上

出处: 87.4% of My Agent's Decisions Run on a 0.8B Model

发布: 2026年4月1日

📄 中文摘要

个人 AI 代理 mini-agent 的推理调用中，87.4% 运行在一个 0.8B 参数模型上，该模型在生产环境中连续运行了 18 天。mini-agent 是一个感知驱动的系统，负责监控开发环境、管理任务和协助项目。其核心是 Claude（Opus/Sonnet），尽管功能强大，但每次调用都需要消耗代币和时间。因此，构建了一个级联层：首先由本地的 0.8B 模型（Qwen2.5）处理决策，只有在无法处理或任务需要深入推理时，才会将请求升级到 9B 模型，再到 Claude。

🏷️ 相关标签

#AI代理 #决策模型 #生产环境

📄 English Summary

87.4% of My Agent's Decisions Run on a 0.8B Model

87.4% of the inference calls of a personal AI agent, mini-agent, run on a 0.8B parameter model, which has been in production for 18 consecutive days. The mini-agent is a perception-driven system that monitors the development environment, manages tasks, and assists with projects. Its core is Claude (Opus/Sonnet), which, while powerful, incurs token and time costs for each call. To optimize this, a cascade layer was built: a local 0.8B model (Qwen2.5) handles decisions first, and only escalates to a 9B model or Claude when deeper reasoning is required.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

87.4% of My Agent's Decisions Run on a 0.8B Model

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误