为何在 Promptfoo 加入 OpenAI 后我构建了一个中立的 LLM 评估框架

出处: Why I built a neutral LLM eval framework after Promptfoo joined OpenAI

发布: 2026年3月26日

📄 中文摘要

Promptfoo 是一个流行的开源 LLM 评估框架，最近加入了 OpenAI。这一变化引发了生态系统中的利益冲突，因为评估 AI 系统的工具越来越多地被开发这些系统的公司所拥有。为了解决这一问题，开发了 Rubric，这是一个独立的、MIT 许可的 LLM 和 AI 代理评估框架，旨在保持开源和无企业背景。通过构建 Rubric，发现大多数 LLM 测试框架只关注输出评估，而忽视了评估过程的重要性，强调了代理追踪评估在团队 LLM 测试中的缺失之处。

🏷️ 相关标签

#LLM评估 #开源框架 #利益冲突 #代理追踪评估

📄 English Summary

Why I built a neutral LLM eval framework after Promptfoo joined OpenAI

Promptfoo, a popular open-source LLM evaluation framework, recently joined OpenAI, creating a conflict of interest in the ecosystem as the tools for evaluating AI systems are increasingly owned by the companies that develop them. To address this issue, Rubric was built as an independent, MIT-licensed LLM and AI agent evaluation framework, ensuring it remains open source and free from corporate influence. The development of Rubric revealed that most LLM testing frameworks focus solely on output evaluation, neglecting the importance of the evaluation journey, highlighting the missing piece of agent trace evaluation in most teams' LLM testing narratives.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Why I built a neutral LLM eval framework after Promptfoo joined OpenAI

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误