我发布了一个 LLM 功能,得到了 11 个用户,然后模型悄然改变了。为了防止再次发生,我构建了这个。
📄 中文摘要
作者在没有团队和自动化评估环境的情况下,独立构建了一个使用 Claude 对支持票进行分类的功能,分为三个类别:账单、技术和账户。最初,该功能运行良好,但在六周后,Claude 的输出发生了变化,导致分类错误。具体来说,原本返回的小写类别名称变成了首字母大写,导致200个支持票在四天内被错误路由。作者在没有任何错误日志和异常的情况下,意识到问题并采取措施防止类似情况再次发生。
📄 English Summary
I shipped an LLM feature, got 11 users, then the model silently changed on me. Here's what I built to stop it happening again.
The author independently built a feature using Claude to classify support tickets into three categories: billing, technical, and account, without a team or automated evaluation environment. Initially, the feature worked well for six weeks, but then the output from Claude changed, leading to incorrect classifications. Specifically, the category names that were originally returned in lowercase started appearing with an uppercase first letter, resulting in 200 tickets being misrouted over four days. The author discovered the issue without any error logs or exceptions and implemented measures to prevent similar occurrences in the future.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等