我的大型语言模型开始对我的应用撒谎,我没有注意到三天

📄 中文摘要

一名用户通过Slack发消息指出摘要看起来奇怪,经过调查发现,模型在72小时内约12%的请求返回了格式错误的JSON。错误处理机制吞噬了解析失败,导致用户获取到的是前一天的数据而非当天的数据。没有监控系统能够捕捉到这个问题,因为没有抛出异常,模型在几个月内可靠遵循的格式指令突然失效。尽管模型版本和提示没有变化,但其行为却发生了改变。

📄 English Summary

My LLM Started Lying to My App and I Didn't Notice for Three Days

A user alerted via Slack that the summaries appeared strange. Upon investigation, it was found that the model had been returning malformed JSON on about 12% of requests for 72 hours. The error handling mechanism was swallowing parse failures, resulting in users receiving stale data labeled as current. No monitoring system caught the issue since no exceptions were raised; the model quietly deviated from format instructions it had reliably followed for months. Despite no changes to the model version or prompt, the behavior had altered.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等