Anthropic构建了一个30万查询的行为审计工具,因为模型行为会发生变化。以下是生产版本。

📄 中文摘要

Anthropic在其对齐研究中开发了一种名为Petri的内部工具,这是一个自动化的行为审计系统,用于跟踪模型在不同版本和训练运行中的行为变化。该系统运行了超过30万个测试查询,发现Claude、GPT-4o、Gemini和Grok之间存在“数千个直接矛盾和解释模糊”。与此同时,五角大楼的首席技术官将Claude视为供应链风险,指出Anthropic的训练宪法“嵌入在模型中”,并“直接影响Claude的行为”。Anthropic确认,2026年的宪法在这一过程中“发挥了关键作用”。这对每个使用这些API的开发者意味着,他们在发布时需要更加谨慎。

📄 English Summary

Anthropic Built a 300K-Query Behavioral Auditing Tool Because Model Behavior Changes. Here's the Production Version.

Anthropic has developed an internal tool named Petri, an automated behavioral auditing system designed to track how model behavior shifts across different versions and training runs. The system executed over 300,000 test queries, revealing 'thousands of direct contradictions and interpretive ambiguities' among Claude, GPT-4o, Gemini, and Grok. This development coincided with the Pentagon's CTO labeling Claude as a supply chain risk, noting that Anthropic's training constitution is 'baked into the model' and 'directly shapes Claude's behavior.' Anthropic confirmed that the 2026 constitution plays a crucial role in this process. This situation implies that developers using these APIs must exercise greater caution when shipping their products.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等