我们构建了一项服务,在用户发现之前捕捉 LLM 漂移

📄 中文摘要

在发布了基于 LLM 的功能后,初期测试表现良好,用户反馈积极。然而,几周后,支持邮箱却涌入了用户的投诉,输出结果错误,应用解析的 JSON 格式不正确,分类器的回答也出现了不一致。这种现象被称为 LLM 漂移,开发者往往在用户反馈后才意识到这一问题。2025 年 2 月,r/LLMDevs 的开发者们报告称,GPT-4o 在没有任何提前通知的情况下改变了行为,导致输出结果显著变化。这种情况不仅发生在 OpenAI,Claude、Gemini 以及一些“冻结”的模型版本也会意外改变行为,给开发者带来困扰。

📄 English Summary

We Built a Service That Catches LLM Drift Before Your Users Do

After launching an LLM-powered feature that performed well in testing and received positive feedback during beta, developers faced a surge of complaints in their support inbox just three weeks later. Users reported incorrect outputs, issues with JSON parsing, and inconsistent classifier responses, indicating that the LLM had drifted without prior notice. This phenomenon of LLM drift is more common than anticipated. In February 2025, developers on r/LLMDevs reported that GPT-4o changed its behavior unexpectedly, significantly altering prompt outputs without any advance warning. This issue is not limited to OpenAI; models like Claude, Gemini, and even supposedly frozen versions can also exhibit unexpected behavior changes, causing significant challenges for developers.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等