4 个 SQLite 表替代了我每月 200 美元的 AI 可观察性堆栈
📄 中文摘要
一个 AI 代理系统管理着 16 个团队,使用 4 个不同的 LLM 提供者。两个月前,其中一个团队开始悄然出现政策决策的幻觉,作者在 11 分钟内发现了这一问题。并不是通过 Datadog 或 Honeycomb,而是通过 47 行 Python 代码写入 SQLite 数据库。虽然 OpenTelemetry 正在开发 LLM 跟踪的语义约定,但作者在六个月前就需要这样的解决方案,因此构建了自己的系统。该系统提供了一个基于 SQLite 的审计跟踪,记录了多代理 AI 协调的每一次 LLM 调用、模型路由决策和偏见检测事件。338 条审计记录和 108 个事件揭示了 3 个静默故障,这些故障是基于监控无法检测到的。
📄 English Summary
4 SQLite Tables Replaced My $200/mo AI Observability Stack
An AI agent system manages 16 teams across four different LLM providers. Two months ago, one team began to silently hallucinate policy decisions, which the author detected in just 11 minutes. This was not achieved using Datadog or Honeycomb, but rather through 47 lines of Python code writing to a SQLite database. While OpenTelemetry is working on semantic conventions for LLM tracing, the author needed such a solution six months ago and thus built their own. This setup features a SQLite-backed audit trail that logs every LLM call, model routing decision, and bias detection event in a multi-agent AI orchestration. A total of 338 audit entries and 108 events exposed three silent failures that traditional cost-based monitoring failed to detect.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等