为何在 Promptfoo 加入 OpenAI 后我构建了一个中立的 LLM 评估框架
📄 中文摘要
Promptfoo 是一个流行的开源 LLM 评估框架,最近加入了 OpenAI。这一变化引发了生态系统中的利益冲突,因为评估 AI 系统的工具越来越多地被开发这些系统的公司所拥有。为了解决这一问题,开发了 Rubric,这是一个独立的、MIT 许可的 LLM 和 AI 代理评估框架,旨在保持开源和无企业背景。通过构建 Rubric,发现大多数 LLM 测试框架只关注输出评估,而忽视了评估过程的重要性,强调了代理追踪评估在团队 LLM 测试中的缺失之处。
📄 English Summary
Why I built a neutral LLM eval framework after Promptfoo joined OpenAI
Promptfoo, a popular open-source LLM evaluation framework, recently joined OpenAI, creating a conflict of interest in the ecosystem as the tools for evaluating AI systems are increasingly owned by the companies that develop them. To address this issue, Rubric was built as an independent, MIT-licensed LLM and AI agent evaluation framework, ensuring it remains open source and free from corporate influence. The development of Rubric revealed that most LLM testing frameworks focus solely on output evaluation, neglecting the importance of the evaluation journey, highlighting the missing piece of agent trace evaluation in most teams' LLM testing narratives.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等